Glycan Structures

Represent polysaccharide molecules and their associated functions

Represent a sugar graph with pseudo-directed edges.

class glypy.structure.glycan.Glycan(root=None, index_method='dfs', canonicalize=False)[source]

Represents a full graph of connected Monosaccharide objects and their connecting bonds.

Variables
  • root (Monosaccharide) – The first monosaccharide unit of the glycan, and the reducing end if present.

  • index (list) – A list of the Monosaccharide instances in self in the order they are encountered by traversal by traversal_methods[index_method]

  • link_index (list) – A list of the Link connecting the Monosaccharide instances in self in the order they are encountered by traversal by traversal_methods[index_method]

  • reducing_end (ReducedEnd or None) – The reducing end on root.

  • branch_lengths (dict) – A dictionary mapping branch symbols to their lengths

  • branch_parent_map (dict) – A dictionary mapping branch symbols to their parent branch symbols

Indexing

Glycans support __getitem__() on index, as well as several other methods related to finding elements and building and maintaining unique indices.

Glycan.__getitem__(ix)[source]

Fetch a Monosaccharide from index.

Return type

Monosaccharide

Raises

IndexError: – If the provided ix exceeds the length of the index, or if index has not been populated.

Glycan.get(ix)[source]

Get a Monosaccharide from this structure by its id value.

If index is populated it will be iterated over, otherwise __iter__() will be called.

Parameters

ix (int or tuple) – The id value to search for.

Return type

Monosaccharide

Raises

IndexError: – If the id value is not found

Search for a Link by id value.

This will use iterlinks() to iterate over the linkages in the structure

Parameters

ix (int) – The link index to search for.

Return type

Link

Raises

IndexError: – if the id value is not found

Sizing

Glycan.order(deep=False)[source]

The number of nodes in the graph. __len__() is an alias of this

Return type

int

Glycan.count_branches()[source]

Count the number of branches in the Glycan tree

Return type

int

Different branches may have different lengths. An indexed Glycan’s branch_lengths dict holds a mapping from branch label to length. When an existing branch forks, each child branch is given a new label, but the parent branch is as long as its longest child, and each child branch is at least as long as its parent + 1.

Ordering and Index Building

Glycan.reroot(index_method='dfs')[source]

Set root to the node with the lowest id.

Should only be used if the glycan has been indexed.

Parameters

index_method (str, optional) – The name of the index method to use to reindex the glycan relative to the new root node. If None, no reindexing is done. The default is ‘dfs’

Returns

self

Return type

Glycan

Glycan.reindex(method='dfs')[source]

Traverse the graph using the function specified by method. The order of traversal defines the new id value for each Monosaccharide and Link.

The order of traversal also defines the ordering of the Monosaccharide in index and Link in link_index.

Prior to constructing a Glycan instance, component Monosaccharide instances may be labeled, converting their id field into a tuple.

Calls label_branches() after indexing is complete.

Returns

self

Return type

Glycan

Glycan.deindex()[source]

When combining two Glycan structures, very often their component ids will overlap, making it impossible to differentiate between a cycle and the new graph. This function mangles all of the node and link ids so that they are distinct from the pre-existing nodes.

Returns

self

Return type

Glycan

Glycan.label_branches()[source]

Labels each branch point with an alphabetical symbol. Also computes and stores each branch’s length and stores it in branch_lengths. Sets branch_lengths of self and Link.label for each link attached to self. Also populates branch_parent_map.

Branch symbols are increasing alphabetical characters. The root branch is denoted ‘-’, though glycans having an root with multiple children will not have any actual branches with that label.

Link.label updates use the current branch symbol, and the index of that link along that branch.

Note

Labeling always uses a depth-first traversal of nodes.

Traversal

Glycan structures may be linear or branching, and can be traversed many ways. By default, a Glycan.depth_first_traversal() is used, which will fully traverse one branch before visiting another, but other methods are available. Some methods simply control the behavior of the iterator but do not control the order of iteration, and take a method argument where either the name of the traversal method or a callable is specified.

Glycan objects implement the Iterable interface, and their __iter__() method Glycan.depth_first_traversal().

Glycan.depth_first_traversal(from_node=None, apply_fn=<function identity>, visited=None)[source]

Make a depth-first traversal of the glycan graph. Children are explored in descending bond-order.

This is the default traversal method for all Glycan objects. dfs() is an alias of this method. Both names can be used to specify this strategy to _get_traversal_method().

When selecting an iteration strategy, this strategy is specified as “dfs”.

Parameters
  • from_node (None or Monosaccharide) – If from_node is None, then traversal starts from the root node. Otherwise it begins from the given node.

  • apply_fn (function) – A function applied to each node on arrival. If this function returns a non-None value, the result is yielded from the generator, otherwise it is ignored. Defaults to identity()

  • visited (set or None) – A set of node ID values to ignore. If None, defaults to the empty set

Yields

Return Value of apply_fn, by default Monosaccharide

Glycan.breadth_first_traversal(from_node=None, apply_fn=<function identity>, visited=None)[source]

Make a breadth-first traversal of the glycan graph. Children are explored in descending bond-order.

When selecting an iteration strategy, this strategy is specified as “bfs”.

Parameters
  • from_node (None or Monosaccharide) – If from_node is None, then traversal starts from the root node. Otherwise it begins from the given node.

  • apply_fn (function) – A function applied to each node on arrival. If this function returns a non-None value, the result is yielded from the generator, otherwise it is ignored. Defaults to identity()

  • visited (set or None) – A set of node ID values to ignore. If None, defaults to the empty set

Yields

Return Value of apply_fn, by default Monosaccharide

Glycan.indexed_traversal(from_node=None, apply_fn=<function identity>, visited=None)[source]

Traverse the glycan structure along index.

This is substantially faster than other traversal methods for complete traversals at the cost of a) requiring a call to reindex() to populate index if it has not been called, and b) is not automatically updated if the structure is modified.

When selecting an iteration strategy, this strategy is specified as “index”.

Parameters
  • from_node (None or Monosaccharide) – If from_node is None, then traversal starts from the root node. Otherwise it begins from the given node.

  • apply_fn (function) – A function applied to each node on arrival. If this function returns a non-None value, the result is yielded from the generator, otherwise it is ignored. Defaults to identity()

  • visited (set or None) – A set of node ID values to ignore. If None, defaults to the empty set

Yields

Return Value of apply_fn, by default Monosaccharide

See also

reindex

Glycan.iternodes(from_node=None, apply_fn=<function identity>, method='dfs', visited=None)[source]

Generic iterator over nodes dispatching to a strategy given by method, defaulting to depth_first_traversal().

Parameters
  • from_node (None or Monosaccharide) – If from_node is None, then traversal starts from the root node. Otherwise it begins from the given node.

  • apply_fn (function) – A function applied to each node on arrival. If this function returns a non-None value, the result is yielded from the generator, otherwise it is ignored. Defaults to identity()

  • method (str or function) – Traversal method to use. See _get_traversal_method()

  • visited (set or None) – A set of node ID values to ignore. If None, defaults to the empty set

Yields

Return Value of apply_fn, by default Monosaccharide

Iterates over all Link objects in Glycan.

Parameters
  • substituents (bool) – If substituents is True, then include the Link objects in substituent_links on each Monosaccharide

  • method (str or function) – The traversal method controlling the order of the nodes visited

  • visited (None or set) – The collection of id values to ignore when traversing

Yields

Link

Glycan._get_traversal_method(method)[source]

An internal helper method used to resolve traversal methods by name or alias.

Parameters

method (str or Callable) –

If a str, it is looked up in the class-level traversal_methods dictionary and the name of the appropriate method is retrieved with getattr() and returned.

If a Callable, the function’s first parameter is bound to self and returned.

Return type

Callable

Specialized Traversals

Glycan.leaves(bidirectional=False, method='dfs', visited=None)[source]

Iterates over all Monosaccharide objects in Glycan, yielding only those that have no child nodes.

Parameters
  • bidirectional (bool) – If bidirectional is True, then only Monosaccharide objects with only one entry in links.

  • method (str or function) – The traversal method controlling the order of the nodes visited

  • visited (None or set) – The collection of id values to ignore when traversing

Yields

Monosaccharide

Canonicalization

The same glycan structure can be constructed/written multiple ways, but they should all have the same representation. That representation is derived by applying a canonicalization algorithm to the structure, which will sort the branches of each node according to the order they should be traversed in.

If a structure has been constructed manually, the user should call Glycan.canonicalize() before assuming that identical structures will have the same traversal paths.

Glycan.canonicalize(canonicalizer=None, **kwargs)[source]

Canonicalize this glycan, sorting the order in which its links from the same monosaccharide are traversed.

This currently uses the the GlycoCT canonicalization algorithm.

Parameters
  • canonicalizer (subclass of CanonicalizerBase, optional) – The canonicalization algorithm to use

  • **kwargs – Forwarded to the canonicalizer

Returns

This glycan, reordered in place.

Return type

Glycan

Equality Comparison

Glycan objects support equality comparison operators, == and !=. They also support hashing, using the hash() value of the canonical GlycoCT representation of the structure.

Glycan.exact_ordering_equality(other)[source]

Two glycans are considered equal if they are identically ordered nodes.

Parameters
Return type

bool

See also

glypy.structure.Monosaccharide.exact_ordering_equality() Exact Matching

Glycan.topological_equality(other)[source]

Two glycans are considered equal if they are topologically equal.

Parameters
Return type

bool

See also

glypy.structure.Monosaccharide.topological_equality() Topological Matching

Glycan.__eq__(other)[source]

Test for exact ordering equality

Parameters

other (Glycan) –

Return type

bool

Glycan.__hash__()[source]

Hashes the structure from the GlycoCT text representation

See also

serialize()

Ambiguous Structures

When a structure has unknown or ambiguous connections between is nodes, AmbiguousLink instances may be used to express the possible options, or their locations may be expressed with an unknown position constant, represented with -1. Two methods are included to detect these scenarios, and one is used to iterate over possible configuration states described by AmbiguousLink.

Support for ambiguous connections is only partial. For instance, glypy can read UND sections from GlycoCT, but does not attempt to render them.

Locate all links which are AmbiguousLink objects

Returns

list of ambiguous links

Return type

list

Glycan.has_undefined_linkages()[source]

Check if this structure has undefined or ambiguous connectivity between its nodes.

Returns

If any of its links are AmbiguousLink instances, or have unknown positions (-1).

Return type

bool

Glycan.iterconfiguration()[source]

Iterate over all valid configurations of ambiguous linkages.

During calculation, the AmbiguousLink objects may be mutated, but by the time a new configuration is yielded all changes should be reversed. If an error occurs during configuration adjustment, it may not be possible to restore the object to its original state.

Yields

tuple of (AmbiguousLink, Monosaccharide, – Monosaccharide, int, int) The ambiguous link, the parent chosen, the child chosen, the parent linkage site chose, and the child linkage site chosen

Examples

>>> from glypy.io import glyspace
>>> structure_record = glyspace.get("G81339YK")
>>> structure = structure_record.structure_
>>> configurations = []
>>> for config_list in structure.iterconfiguration():
...     instance = structure.clone()
...     for link, conf in config_list:
...         link = instance.get_link(link.id)
...         parent = instance.get(conf[0].id)
...         child = instance.get(conf[1].id)
...         link.reconfigure(parent, child, conf[2], conf[3])
...     configurations.append(instance)
>>> len(configurations)
4

Serialization

There are many ways to write glycan structures as text. By default, glypy will render Glycan instances using GlycoCT, but the Glycan.serialize() method can be used to specify different serialization formats. For more information on those options, see glypy.io.

When converting a Glycan to a string, Glycan.serialize() will be used with its default argument.

Glycan.serialize(name='glycoct')[source]

Convert the structure to text.

The serialization format is given by a available_serializers().

Parameters

name (str, optional) – The name of the serialization format (the default is ‘glycoct’)

Return type

str

classmethod Glycan.register_serializer(name, method)[source]

Add method as name to the set of serializers to pick from in serialize()

Parameters
  • name (str) – The name of the serializer

  • method (Callable) – A callable object that when called with a Glycan returns a str

classmethod Glycan.available_serializers()[source]

Get the list of available serialization formats

Return type

list of str

Mass Spectrometry Utilities

glypy was originally written to support software for mass spectrometry experiments on glycans. Like all molecular objects in the library, they support the Glycan.mass() and Glycan.total_composition() methods. Additionally, they can generate glycosidic and cross-ring fragments, as well as internal fragments caused by any combination of the two.

Glycan.total_composition(method='dfs')[source]

Computes the sum of the composition of all Monosaccharide objects in self

Return type

Composition

Glycan.mass(average=False, charge=0, mass_data=None, method='dfs')[source]

Calculates the total mass of the intact graph by querying each node for its mass.

Parameters
  • average (bool) – Whether or not to use the average isotopic composition when calculating masses. When average == False, masses are calculated using monoisotopic mass.

  • charge (int) – If charge is non-zero, m/z is calculated, where m is the theoretical mass, and z is charge

  • mass_data (dict) – If mass_data is None, standard NIST mass and isotopic abundance data are used. Otherwise the contents of mass_data are assumed to contain elemental mass and isotopic abundance information.

Return type

float

Fragmentation

Glycan.fragments(kind='BY', max_cleavages=1, average=False, charge=0, mass_data=None, traversal_method='dfs')[source]

Generate carbohydrate backbone fragments from this glycan by examining the disjoint subtrees created by removing one or more monosaccharide-monosaccharide bond.

Parameters
  • kind (Iterable) – Any Iterable of characters corresponding to A/B/C/X/Y/Z as published by Domon and Costello

  • max_cleavages (int) – The maximum number of bonds to break per fragment

  • average (bool, optional, defaults to False) – Whether or not to use the average isotopic composition when calculating masses. When average == False, masses are calculated using monoisotopic mass.

  • charge (int, optional, defaults to 0) – If charge is non-zero, m/z is calculated, where m is the theoretical mass, and z is charge

  • mass_data (dict, optional, defaults to None) – If mass_data is None, standard NIST mass and isotopic abundance data are used. Otherwise the contents of mass_data are assumed to contain elemental mass and isotopic abundance information.

Yields

GlycanFragment

See also

glypy.composition.composition.calculate_mass(), subtrees(), crossring_subtrees(), Subtree.to_fragments()

Glycan.name_fragment(fragment)[source]

Attempt to assign a full name to a fragment based on the branch and position relative to the reducing end along side A/B/C/X/Y/Z, according to Domon and Costello

The formal grammar for fragment names in Backus-Naur Form:

<full-name>                ::= <fragment-name>|<fragment-name-list>
<fragment-name>            ::= <glycosidic-fragment-name>|<crossring-fragment-name>
<fragment-name-list>       ::= <fragment-name>"-"<fragment-name-list>|<fragment-name>
<glycosidic-fragment-name> ::= <branch-identifier><fragment-type><index>
<crossring-fragment-name>  ::= <ring-coordinates><fragment-type><branch-identifier><index>
<fragment-type>            ::= "A" | "B" | "C" | "X" | "Y" | "Z"
<ring-coordinate>          ::= <integer>,<integer>
<index>                    ::= <integer>
<integer>                  ::= <digit>|<integer><digit>
<digit>                    ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
<branch-identifier>        ::= <letter>|<letter><digit>|""
<letter>                   ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" |
                               "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" |
                               "u" | "v" | "w" | "x" | "y" | "z"

Note

There are also helper methods which modify the called object iteratively, restoring the original state after the generator is complete. They should not be used directly, instead see Glycan.fragments() and Glycan.substructures().

Iteratively generate all subtrees from glycosidic bond cleavages, creating all \(2{L \choose n}\) subtrees.

Parameters

n_links (int) – Number of links to break simultaneously

Yields

Subtree

Glycan.crossring_subtrees(n_links)[source]

Generate all combinations of cross ring fragments and glycosidic cleavages, cleaving between 1 and n_links monosaccharides paired with n_links - 1 to 0 glycosidic cleavages.

Parameters

n_links (int) – Total number of breaks to create, between cross ring cleavages and complemenatary glycosidic cleavages.

Yields

Subtree

Sub-Structures

Glycan.substructures(max_cleavages=1, min_cleavages=1, inplace=False)[source]

Generate disjoint subtrees from this glycan by removing one or more monosaccharide-monosaccharide bond.

Parameters
  • max_cleavages (int) – The maximum number of bonds to break per substructure

  • min_cleavages (int) – The minimum number of bonds to break per substructure

  • min_size (int) – The minimum number of monosaccharides per substructure

Glycan.fragment_to_substructure(fragment)[source]

Extract the substructure of tree which is contained in fragment

Parameters
  • fragment (GlycanFragment) – The GlycanFragment to extract substructure for.

  • tree (Glycan) – The Glycan to extract substructure from.

Returns

The Glycan substructure defined by the nodes contained in fragment as found in tree

Return type

Glycan

Miscellaneous

Glycan.clone(index_method='dfs', visited=None, cls=None)[source]

Create a copy of self, indexed using index_method, a traversal method or None.

Parameters
  • index_method (str) – The indexing method to use when constructing the index of the copied structure

  • visited (set, optional) – A set of nodes to omit traversing through during the copying processing

  • cls (type) – A subclass of Glycan, defaulting to __class__

Return type

Glycan

Glycan.set_reducing_end(value)[source]

Sets the reducing end type, and configures the root Monosaccharide appropriately.

If the reducing_end is not None, then the following state changes are made to root:

self.root.ring_start = 0
self.root.ring_end = 0
self.root.anomer = "uncyclized"

Else, the correct state is unknown:

self.root.ring_start = UnknownPosition
self.root.ring_end = UnknownPosition
self.root.anomer = None

Note

This method is called automatically when setting reducing_end, and does not

need to be used explicitly.

Glycan objects support root() and tree(), returning root and the object itself, respectively.