Glycan Structures¶
Represent polysaccharide molecules and their associated functions
Represent a sugar graph with pseudo-directed edges.
- class glypy.structure.glycan.Glycan(root=None, index_method='dfs', canonicalize=False)[source]¶
Represents a full graph of connected
Monosaccharide
objects and their connecting bonds.- Variables
root (
Monosaccharide
) – The first monosaccharide unit of the glycan, and the reducing end if present.index (
list
) – A list of theMonosaccharide
instances inself
in the order they are encountered by traversal bytraversal_methods[index_method]
link_index (
list
) – A list of theLink
connecting theMonosaccharide
instances inself
in the order they are encountered by traversal bytraversal_methods[index_method]
reducing_end (
ReducedEnd
orNone
) – The reducing end onroot
.branch_lengths (
dict
) – A dictionary mapping branch symbols to their lengthsbranch_parent_map (
dict
) – A dictionary mapping branch symbols to their parent branch symbols
Glycan Methods
Indexing¶
Glycans support __getitem__()
on index
, as well as several other
methods related to finding elements and building and maintaining unique indices.
- Glycan.__getitem__(ix)[source]¶
Fetch a
Monosaccharide
fromindex
.- Return type
- Raises
IndexError: – If the provided
ix
exceeds the length of the index, or ifindex
has not been populated.
- Glycan.get(ix)[source]¶
Get a
Monosaccharide
from this structure by itsid
value.If
index
is populated it will be iterated over, otherwise__iter__()
will be called.- Parameters
- Return type
- Raises
IndexError: – If the id value is not found
- Glycan.get_link(ix)[source]¶
Search for a
Link
byid
value.This will use
iterlinks()
to iterate over the linkages in the structure
Sizing¶
- Glycan.order(deep=False)[source]¶
The number of nodes in the graph.
__len__()
is an alias of this- Return type
Different branches may have different lengths. An indexed Glycan
’s branch_lengths
dict
holds a mapping from branch label to length. When an existing branch forks, each child branch is given a new label,
but the parent branch is as long as its longest child, and each child branch is at least as long as its parent + 1.
Ordering and Index Building¶
- Glycan.reroot(index_method='dfs')[source]¶
Set
root
to the node with the lowestid
.Should only be used if the glycan has been indexed.
- Glycan.reindex(method='dfs')[source]¶
Traverse the graph using the function specified by
method
. The order of traversal defines the newid
value for eachMonosaccharide
andLink
.The order of traversal also defines the ordering of the
Monosaccharide
inindex
andLink
inlink_index
.Prior to constructing a
Glycan
instance, componentMonosaccharide
instances may be labeled, converting their id field into a tuple.Calls
label_branches()
after indexing is complete.- Returns
self
- Return type
See also
- Glycan.deindex()[source]¶
When combining two Glycan structures, very often their component ids will overlap, making it impossible to differentiate between a cycle and the new graph. This function mangles all of the node and link ids so that they are distinct from the pre-existing nodes.
- Returns
self
- Return type
- Glycan.label_branches()[source]¶
Labels each branch point with an alphabetical symbol. Also computes and stores each branch’s length and stores it in
branch_lengths
. Setsbranch_lengths
ofself
andLink.label
for each link attached toself
. Also populatesbranch_parent_map
.Branch symbols are increasing alphabetical characters. The root branch is denoted ‘-’, though glycans having an
root
with multiple children will not have any actual branches with that label.Link.label
updates use the current branch symbol, and the index of that link along that branch.Note
Labeling always uses a depth-first traversal of nodes.
Traversal¶
Glycan structures may be linear or branching, and can be traversed many ways. By
default, a Glycan.depth_first_traversal()
is used, which will fully traverse
one branch before visiting another, but other methods are available. Some methods
simply control the behavior of the iterator but do not control the order of iteration,
and take a method
argument where either the name of the traversal method or a callable
is specified.
Glycan
objects implement the Iterable
interface, and their
__iter__()
method Glycan.depth_first_traversal()
.
- Glycan.depth_first_traversal(from_node=None, apply_fn=<function identity>, visited=None)[source]¶
Make a depth-first traversal of the glycan graph. Children are explored in descending bond-order.
This is the default traversal method for all
Glycan
objects.dfs()
is an alias of this method. Both names can be used to specify this strategy to_get_traversal_method()
.When selecting an iteration strategy, this strategy is specified as “dfs”.
- Parameters
from_node (None or Monosaccharide) – If
from_node
isNone
, then traversal starts from the root node. Otherwise it begins from the given node.apply_fn (function) – A function applied to each node on arrival. If this function returns a non-None value, the result is yielded from the generator, otherwise it is ignored. Defaults to
identity()
visited (set or None) – A
set
of node ID values to ignore. IfNone
, defaults to the emptyset
- Yields
Return Value of
apply_fn
, by defaultMonosaccharide
See also
- Glycan.breadth_first_traversal(from_node=None, apply_fn=<function identity>, visited=None)[source]¶
Make a breadth-first traversal of the glycan graph. Children are explored in descending bond-order.
When selecting an iteration strategy, this strategy is specified as “bfs”.
- Parameters
from_node (None or Monosaccharide) – If
from_node
isNone
, then traversal starts from the root node. Otherwise it begins from the given node.apply_fn (function) – A function applied to each node on arrival. If this function returns a non-None value, the result is yielded from the generator, otherwise it is ignored. Defaults to
identity()
visited (set or None) – A
set
of node ID values to ignore. IfNone
, defaults to the emptyset
- Yields
Return Value of
apply_fn
, by defaultMonosaccharide
See also
- Glycan.indexed_traversal(from_node=None, apply_fn=<function identity>, visited=None)[source]¶
Traverse the glycan structure along
index
.This is substantially faster than other traversal methods for complete traversals at the cost of a) requiring a call to
reindex()
to populateindex
if it has not been called, and b) is not automatically updated if the structure is modified.When selecting an iteration strategy, this strategy is specified as “index”.
- Parameters
from_node (None or Monosaccharide) – If
from_node
isNone
, then traversal starts from the root node. Otherwise it begins from the given node.apply_fn (function) – A function applied to each node on arrival. If this function returns a non-None value, the result is yielded from the generator, otherwise it is ignored. Defaults to
identity()
visited (set or None) – A
set
of node ID values to ignore. IfNone
, defaults to the emptyset
- Yields
Return Value of
apply_fn
, by defaultMonosaccharide
See also
- Glycan.iternodes(from_node=None, apply_fn=<function identity>, method='dfs', visited=None)[source]¶
Generic iterator over nodes dispatching to a strategy given by
method
, defaulting todepth_first_traversal()
.- Parameters
from_node (None or Monosaccharide) – If
from_node
isNone
, then traversal starts from the root node. Otherwise it begins from the given node.apply_fn (function) – A function applied to each node on arrival. If this function returns a non-None value, the result is yielded from the generator, otherwise it is ignored. Defaults to
identity()
method (str or
function
) – Traversal method to use. See_get_traversal_method()
visited (set or None) – A
set
of node ID values to ignore. IfNone
, defaults to the emptyset
- Yields
Return Value of
apply_fn
, by default Monosaccharide
- Glycan.iterlinks(apply_fn=<function identity>, substituents=False, method='dfs', visited=None)[source]¶
Iterates over all
Link
objects inGlycan
.- Parameters
substituents (bool) – If
substituents
isTrue
, then include theLink
objects insubstituent_links
on eachMonosaccharide
method (str or function) – The traversal method controlling the order of the nodes visited
visited (None or set) – The collection of id values to ignore when traversing
- Yields
Link
- Glycan._get_traversal_method(method)[source]¶
An internal helper method used to resolve traversal methods by name or alias.
Specialized Traversals¶
- Glycan.leaves(bidirectional=False, method='dfs', visited=None)[source]¶
Iterates over all
Monosaccharide
objects inGlycan
, yielding only those that have no child nodes.- Parameters
bidirectional (bool) – If
bidirectional
isTrue
, then onlyMonosaccharide
objects with only one entry inlinks
.method (str or function) – The traversal method controlling the order of the nodes visited
visited (None or set) – The collection of id values to ignore when traversing
- Yields
Canonicalization¶
The same glycan structure can be constructed/written multiple ways, but they should all have the same representation. That representation is derived by applying a canonicalization algorithm to the structure, which will sort the branches of each node according to the order they should be traversed in.
If a structure has been constructed manually, the user should call Glycan.canonicalize()
before assuming that identical structures will have the same traversal paths.
- Glycan.canonicalize(canonicalizer=None, **kwargs)[source]¶
Canonicalize this glycan, sorting the order in which its links from the same monosaccharide are traversed.
This currently uses the the GlycoCT canonicalization algorithm.
- Parameters
canonicalizer (subclass of
CanonicalizerBase
, optional) – The canonicalization algorithm to use**kwargs – Forwarded to the canonicalizer
- Returns
This glycan, reordered in place.
- Return type
Equality Comparison¶
Glycan objects support equality comparison operators, ==
and !=
. They also support hashing,
using the hash()
value of the canonical GlycoCT representation of the structure.
- Glycan.exact_ordering_equality(other)[source]¶
Two glycans are considered equal if they are identically ordered nodes.
See also
glypy.structure.Monosaccharide.exact_ordering_equality()
Exact Matching
- Glycan.topological_equality(other)[source]¶
Two glycans are considered equal if they are topologically equal.
See also
glypy.structure.Monosaccharide.topological_equality()
Topological Matching
Ambiguous Structures¶
When a structure has unknown or ambiguous connections between is nodes, AmbiguousLink
instances
may be used to express the possible options, or their locations may be expressed with an unknown position
constant, represented with -1
. Two methods are included to detect these scenarios, and one is used to iterate
over possible configuration states described by AmbiguousLink
.
Support for ambiguous connections is only partial. For instance, glypy
can read UND
sections from
GlycoCT, but does not attempt to render them.
- Glycan.ambiguous_links()[source]¶
Locate all links which are
AmbiguousLink
objects- Returns
list of ambiguous links
- Return type
- Glycan.has_undefined_linkages()[source]¶
Check if this structure has undefined or ambiguous connectivity between its nodes.
- Returns
If any of its links are
AmbiguousLink
instances, or have unknown positions (-1).- Return type
- Glycan.iterconfiguration()[source]¶
Iterate over all valid configurations of ambiguous linkages.
During calculation, the
AmbiguousLink
objects may be mutated, but by the time a new configuration is yielded all changes should be reversed. If an error occurs during configuration adjustment, it may not be possible to restore the object to its original state.- Yields
tuple
of (AmbiguousLink
,Monosaccharide
, –Monosaccharide
,int
,int
) The ambiguous link, the parent chosen, the child chosen, the parent linkage site chose, and the child linkage site chosen
Examples
>>> from glypy.io import glyspace >>> structure_record = glyspace.get("G81339YK") >>> structure = structure_record.structure_ >>> configurations = [] >>> for config_list in structure.iterconfiguration(): ... instance = structure.clone() ... for link, conf in config_list: ... link = instance.get_link(link.id) ... parent = instance.get(conf[0].id) ... child = instance.get(conf[1].id) ... link.reconfigure(parent, child, conf[2], conf[3]) ... configurations.append(instance) >>> len(configurations) 4
See also
Serialization¶
There are many ways to write glycan structures as text. By default, glypy
will render
Glycan
instances using GlycoCT, but the Glycan.serialize()
method can
be used to specify different serialization formats. For more information on those options, see
glypy.io
.
When converting a Glycan
to a string, Glycan.serialize()
will be used with its
default argument.
- Glycan.serialize(name='glycoct')[source]¶
Convert the structure to text.
The serialization format is given by a
available_serializers()
.
- classmethod Glycan.register_serializer(name, method)[source]¶
Add
method
asname
to the set of serializers to pick from inserialize()
Mass Spectrometry Utilities¶
glypy
was originally written to support software for mass spectrometry experiments on
glycans. Like all molecular objects in the library, they support the Glycan.mass()
and
Glycan.total_composition()
methods. Additionally, they can generate glycosidic and cross-ring
fragments, as well as internal fragments caused by any combination of the two.
- Glycan.total_composition(method='dfs')[source]¶
Computes the sum of the composition of all
Monosaccharide
objects inself
- Return type
Composition
- Glycan.mass(average=False, charge=0, mass_data=None, method='dfs')[source]¶
Calculates the total mass of the intact graph by querying each node for its mass.
- Parameters
average (bool) – Whether or not to use the average isotopic composition when calculating masses. When
average == False
, masses are calculated using monoisotopic mass.charge (int) – If charge is non-zero, m/z is calculated, where m is the theoretical mass, and z is
charge
mass_data (dict) – If mass_data is None, standard NIST mass and isotopic abundance data are used. Otherwise the contents of mass_data are assumed to contain elemental mass and isotopic abundance information.
- Return type
Fragmentation¶
- Glycan.fragments(kind='BY', max_cleavages=1, average=False, charge=0, mass_data=None, traversal_method='dfs')[source]¶
Generate carbohydrate backbone fragments from this glycan by examining the disjoint subtrees created by removing one or more monosaccharide-monosaccharide bond.
- Parameters
kind (
Iterable
) – AnyIterable
of characters corresponding to A/B/C/X/Y/Z as published by Domon and Costellomax_cleavages (
int
) – The maximum number of bonds to break per fragmentaverage (bool, optional, defaults to
False
) – Whether or not to use the average isotopic composition when calculating masses. Whenaverage == False
, masses are calculated using monoisotopic mass.charge (int, optional, defaults to 0) – If charge is non-zero, m/z is calculated, where m is the theoretical mass, and z is
charge
mass_data (dict, optional, defaults to
None
) – If mass_data isNone
, standard NIST mass and isotopic abundance data are used. Otherwise the contents ofmass_data
are assumed to contain elemental mass and isotopic abundance information.
- Yields
GlycanFragment
See also
glypy.composition.composition.calculate_mass()
,subtrees()
,crossring_subtrees()
,Subtree.to_fragments()
- Glycan.name_fragment(fragment)[source]¶
Attempt to assign a full name to a fragment based on the branch and position relative to the reducing end along side A/B/C/X/Y/Z, according to Domon and Costello
The formal grammar for fragment names in Backus-Naur Form:
<full-name> ::= <fragment-name>|<fragment-name-list> <fragment-name> ::= <glycosidic-fragment-name>|<crossring-fragment-name> <fragment-name-list> ::= <fragment-name>"-"<fragment-name-list>|<fragment-name> <glycosidic-fragment-name> ::= <branch-identifier><fragment-type><index> <crossring-fragment-name> ::= <ring-coordinates><fragment-type><branch-identifier><index> <fragment-type> ::= "A" | "B" | "C" | "X" | "Y" | "Z" <ring-coordinate> ::= <integer>,<integer> <index> ::= <integer> <integer> ::= <digit>|<integer><digit> <digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" <branch-identifier> ::= <letter>|<letter><digit>|"" <letter> ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
Note
There are also helper methods which modify the called object iteratively,
restoring the original state after the generator is complete. They should
not be used directly, instead see Glycan.fragments()
and
Glycan.substructures()
.
- Glycan.break_links_subtrees(n_links)[source]¶
Iteratively generate all subtrees from glycosidic bond cleavages, creating all \(2{L \choose n}\) subtrees.
- Parameters
n_links (int) – Number of links to break simultaneously
- Yields
Subtree
- Glycan.crossring_subtrees(n_links)[source]¶
Generate all combinations of cross ring fragments and glycosidic cleavages, cleaving between 1 and
n_links
monosaccharides paired withn_links
- 1 to 0 glycosidic cleavages.- Parameters
n_links (int) – Total number of breaks to create, between cross ring cleavages and complemenatary glycosidic cleavages.
- Yields
Subtree
Sub-Structures¶
- Glycan.substructures(max_cleavages=1, min_cleavages=1, inplace=False)[source]¶
Generate disjoint subtrees from this glycan by removing one or more monosaccharide-monosaccharide bond.
Miscellaneous¶
- Glycan.clone(index_method='dfs', visited=None, cls=None)[source]¶
Create a copy of
self
, indexed usingindex_method
, a traversal method orNone
.
- Glycan.set_reducing_end(value)[source]¶
Sets the reducing end type, and configures the
root
Monosaccharide
appropriately.If the reducing_end is not
None
, then the following state changes are made toroot
:self.root.ring_start = 0 self.root.ring_end = 0 self.root.anomer = "uncyclized"
Else, the correct state is unknown:
self.root.ring_start = UnknownPosition self.root.ring_end = UnknownPosition self.root.anomer = None
Note
This method is called automatically when setting
reducing_end
, and does notneed to be used explicitly.
Glycan
objects support root()
and tree()
, returning root
and the object itself, respectively.