Monosaccharide

Represents individual saccharide residues and their associated functions. These are the basic unit of structural representation, possesing graph node-like properties.

Monosaccharide Objects

class glypy.structure.monosaccharide.Monosaccharide(anomer=None, configuration=None, stem=None, superclass=None, ring_start=-1, ring_end=-1, modifications=None, links=None, substituent_links=None, composition=None, reduced=None, id=None, fast=False)[source]

Represents a single monosaccharide molecule, and its relationships with other molcules through Link objects. Link objects stored in links for connections to other Monosaccharide instances, building a Glycan structure as a graph of Monosaccharide objects. Link objects connecting the Monosaccharide instance to Substituent objects are stored in substituent_links.

Both links and substituent_links are instances of OrderedMultiMap objects where the key is the index of the carbon atom in the carbohydrate backbone that hosts the bond. An index of x or -1 represents an unknown location.

Warning

While Monosaccharide objects expose their modifications, links, and substituent_links attributes as mutable, you should treat them as read-only. The methods for altering their contents, add_substituent(), add_monosaccharide(), add_modification(), drop_substituent(), drop_monosaccharide(), and drop_modification() are all responsible for handling these mutations for you. Link methods like Link.apply() and Link.break_link() are used internally.

Variables
  • anomer (Anomer) – An entry of Anomer that corresponds to the linkage type of the carbohydrate backbone. Is an entry of a class based on Enum

  • superclass (SuperClass) – An entry of SuperClass that corresponds to the number of carbons in the carbohydrate backbone of the monosaccharide. Controls the base composition of the instance and the number of positions open to be linked to or modified. Is an entry of a class based on Enum

  • configuration (Configuration or {‘d’, ‘l’, ‘x’, ‘missing’, None}) – An entry of Configuration which corresponds to the optical stereomer state of the instance. Is an entry of a class based on Enum. May possess more than one value.

  • stem (Stem) – Corresponds to the bond conformation of the carbohydrate backbone. Is an entry of a class based on Enum. May possess more than one value.

  • ring_start (int) – The index of the carbon of the carbohydrate backbone that starts a ring. A value of -1, 'x', or None corresponds to an unknown start. A value of 0 refers to a linear chain.

  • ring_end (int) – The index of the carbon of the carbohydrate backbone that ends a ring. A value of -1, 'x', or None corresponds to an unknown ends. A value of 0 refers to a linear chain.

  • stereocode (Stereocode) – The stereochemistry of all carbons of the monosaccharide’s backbone ring/chain.

  • reducing_end (ReducedEnd) – The reducing end terminal group of the monosaccharide if the monosaccharide is uncyclized

  • modifications (OrderedMultiMap) – The mapping of sites to Modification entries. Directly modifies the instance’s composition

  • links (OrderedMultiMap) – The mapping of sites to Link entries that refer to other Monosaccharide instances

  • substituent_links (OrderedMultiMap) – The mapping of sites to Link entries that refer to Substituent instances.

  • composition (Composition) – An instance of Composition corresponding to the elemental composition of self and its immediate modifications. If not provided, this will be inferred from field values.

  • reduced (ReducedEnd) – An instance of ReducedEnd, or the value True, represents a reduced sugar. May be inferred from modifications if “aldi” is present

Connection Enumeration

Monosaccharide.parents(links=False)[source]

Returns an iterator over the Monosaccharide instances which are considered the ancestors of self.

links: bool

Whether to return the Link objects, or their parents. Defaults to False

Returns

Monosaccharide.children(links=False)[source]

Returns an iterator over the Monosaccharide instancess which are considered the descendants of self

>>> from glypy import glycans
>>> n_linked_core = glycans["N-Linked Core"]
>>> ch = n_linked_core.root.children()
>>> ch[0]
(4, RES 1b:b-dglc-HEX-1:5 2s:n-acetyl LIN 1:1d(2+1)2n)
>>>
Parameters

links (bool) – Whether to return the Link objects, or their children. Defaults to False

Returns

Monosaccharide.substituents()[source]

Returns an iterator over all substituents attached to self by a Link object stored in substituent_links

Returns

  • list of

  • position (int) – Location of the bond to the substituent

  • substituent (Substituent) – Substituent at position

Adding and Removing Connections and Modifications

Monosaccharide.add_monosaccharide(monosaccharide, position=-1, max_occupancy=0, child_position=-1, parent_loss=None, child_loss=None)[source]

Adds a Monosaccharide and associated Link to links at the site given by position.

>>> from glypy import monosaccharides
>>> hexnac = monosaccharides.HexNAc
>>> hex = monosaccharides.Hex
>>> hexnac.add_monosaccharide(hex, 1)
RES 1b:x-xx-HEX-1:5 2s:n-acetyl LIN 1:1d(2+1)2n
>>> hexnac.links[1][0].child
RES 1b:x-xx-HEX-1:5
Parameters
  • monosaccharide (Monosaccharide) – The monosaccharide to add.

  • position (int or 'x') – The location to add the Monosaccharide link to links. Defaults to -1

  • child_position (int) – The location to add the link to in monosaccharide’s links. Defaults to -1.

  • max_occupancy (int, optional) – The maximum number of items acceptable at position. Defaults to 1

  • parent_loss (Composition or str) – The elemental composition removed from self

  • child_loss (Composition or str) – The elemental composition removed from monosaccharide

Raises
  • IndexErrorposition exceeds the bounds set by superclass.

  • ValueErrorposition is occupied by more than max_occupancy elements

Returns

self, for chain calls

Return type

Monosaccharide

Monosaccharide.add_substituent(substituent, position=-1, max_occupancy=0, child_position=1, parent_loss=None, child_loss=None)[source]

Adds a Substituent and associated Link to substituent_links at the site given by position. This new substituent is included when calculating mass with substituents included.

>>> from glypy import monosaccharides
>>> hex = monosaccharides.Hex
>>> hexnac = monosaccharides.HexNAc
>>> hex.add_substituent("n-acetyl", 2, parent_loss="OH")
RES 1b:x-xx-HEX-1:5 2s:n-acetyl LIN 1:1d(2+1)2n
>>> hexnac == hex
True
Parameters
  • substituent (str or Substituent) – The substituent to add. If passed a str it will be translated into an instance of Substituent.

  • position (int or 'x') – The location to add the Substituent link to substituent_links. Defaults to -1

  • child_position (int) – The location to add the link to in substituent links. Defaults to -1. Substituent indices are currently not checked.

  • max_occupancy (int, optional) – The maximum number of items acceptable at position. Defaults to 1

  • parent_loss (Composition or str) – The elemental composition removed from self

  • child_loss (Composition or str) – The elemental composition removed from substituent

Raises
  • IndexErrorposition exceeds the bounds set by superclass.

  • ValueErrorposition is occupied by more than max_occupancy elements

Returns

self, for chain calls

Return type

Monosaccharide

Monosaccharide.add_modification(modification, position, max_occupancy=0)[source]

Adds a modification instance to modifications at the site given by position. This directly modifies composition, consequently changing mass()

Parameters
  • modification (str or Modification) – The modification to add. If passed a str, it will be translated into an instance of Modification

  • position (int or 'x') – The location to add the Modification to.

  • max_occupancy (int, optional) – The maximum number of items acceptable at position. defaults to 1

Raises
  • IndexErrorposition exceeds the bounds set by superclass.

  • ValueErrorposition is occupied by more than max_occupancy elements

Returns

self, for chain calls

Return type

Monosaccharide

Monosaccharide.drop_monosaccharide(position, refund=True)[source]

Remove the glycosidic bond at position, detatching a connected Monosaccharide

If there is more than one glycosidic bond at position, an error will be raised.

>>> from glypy import glycans
>>> n_linked_core = glycans["N-Linked Core"]
>>> n_linked_core.root.drop_monosaccharide(4)
RES 1b:b-dglc-HEX-1:5 2s:n-acetyl LIN 1:1d(2+1)2n
>>> n_linked_core.mass()
221.08993720321
Parameters
  • position (int) – The position to drop the modification from

  • refund (bool) – Passed to break_link()

Raises
  • IndexError: – If position is not a valid carbohydrate backbone position

  • ValueError: – If no Link or more than one Link is found at position

Returns

self, for chain calls

Return type

Monosaccharide

Monosaccharide.drop_substituent(position, substituent=None, refund=True)[source]

Remove the substituent at position.

If substituent is None, then the first substituent found at position is removed.

>>> from glypy import monosaccharides
>>> hex = monosaccharides.Hex
>>> hexnac = monosaccharides.HexNAc
>>> hexnac.drop_substituent(2)
RES 1b:x-xx-HEX-1:5
>>> hexnac == hex
True
Parameters
  • position (int) – The position to drop the modification from

  • substituent (Substituent) – The Substituent to remove. If None, the first substituent found at position will be removed

  • refund (bool) – Passed to break_link()

Raises
  • IndexError: – If position is not a valid carbohydrate backbone position

  • ValueError: – If substituent is not found at position

Returns

self, for chain calls

Return type

Monosaccharide

Monosaccharide.drop_modification(position, modification)[source]

Remove the modification at position

Parameters
  • position (int) – The position to drop the modification from

  • modification (Modification) – The Modification to remove.

Raises
  • IndexError: – If position is not a valid carbohydrate backbone position

  • ValueError: – If modification is not found at position

Returns

self, for chain calls

Return type

Monosaccharide

Position Occupancy

Monosaccharide.is_occupied(position)[source]

Checks to see if a particular backbone position is occupied by a Modification, Substituent, or Link to another Monosaccharide.

Parameters

position (int) – The position to check for occupancy. Passing -1 checks for undetermined attachments.

Returns

The number of occupants at position

Return type

int

Raises

IndexError: – When the position is less than 1 or exceeds the limits of the carbohydrate backbone’s size.

Monosaccharide.open_attachment_sites(max_occupancy=0)[source]

When attaching Monosaccharide instances to other objects, bonds are formed between the carbohydrate backbone and the other object. If a site is already bound, the occupying object fills that space on the backbone and prevents other objects from binding there.

Currently only cares about the availability of the hydroxyl group. As there is not a hydroxyl attached to the ring-ending carbon, that should not be considered an open site.

If any existing attached units have unknown positions, we can’t provide any known positions, in which case the list of open positions will be a list of -1 s of the length of open sites.

Parameters

max_occupancy (int) – The number of objects that may already be bound at a site before it is considered unavailable for attachment.

Returns

  • list – The positions open for binding

  • int – The number of bound but unknown locations on the backbone.

Monosaccharide.total_attachement_sites()[source]
Monosaccharide.occupied_attachment_sites()[source]

Equality Comparison

Monosaccharide objects support equality comparison operators, == and !=. They also support hashing, using the hash() value of Monosaccharide.id.

Monosaccharide.exact_ordering_equality(other, substituents=True, visited=None)[source]

Performs equality testing between two monosaccharides where the exact position (and ordering by sort) of links must to match between the input Monosaccharide objects

Return type

bool

Monosaccharide.topological_equality(other, substituents=True, visited=None)[source]

Performs equality testing between two monosaccharides where the exact ordering of child links does not have to match between the input |Monosaccharide|s, so long as an exact match of the subtrees is found

Return type

bool

Monosaccharide.__eq__(other)[source]

Test for equality between Monosaccharide instances. First try scalar equality of fields, and then compare descendants.

Monosaccharide.__hash__()[source]

Return hash(self).

Serialization

Monosaccharide.serialize(name='glycoct')[source]

Convert this object into text using the requested textual encoding

Parameters

name (str, optional) – The name of the textual encoding, e.g. “glycoct” or “iupac”

Return type

str

classmethod Monosaccharide.register_serializer(name, method)[source]

Add method as name to the set of serializers to pick from in serialize()

Parameters
  • name (str) – The name of the serializer

  • method (Callable) – A callable object that when called with a Monosaccharide returns a str

classmethod Monosaccharide.available_serializers()[source]

Get the list of available serialization formats

Return type

list of str

Mass Spectrometry Utilities

Monosaccharide.total_composition()[source]

Computes the sum of the composition of self and each of its linked Substituents

Return type

Composition

Monosaccharide.mass(average=False, charge=0, mass_data=None, substituents=True)[source]

Calculates the total mass of self.

Parameters
  • average (bool, optional, defaults to False) – Whether or not to use the average isotopic composition when calculating masses. When average == False, masses are calculated using monoisotopic mass.

  • charge (int, optional, defaults to 0) – If charge is non-zero, m/z is calculated, where m is the theoretical mass, and z is charge

  • mass_data (dict, optional) – If mass_data is None, standard NIST mass and isotopic abundance data are used. Otherwise the contents of mass_data are assumed to contain elemental mass and isotopic abundance information. Defaults to None.

  • substituents (bool, optional, defaults to True) – Whether or not to include substituents’ masses.

Return type

float

Miscellaneous

Monosaccharide.clone(prop_id=False, fast=True, monosaccharide_type=None)[source]

Copies just this Monosaccharide and its |Substituent|s, creating a separate instance with the same data. All mutable data structures are duplicated and distinct from the original.

Does not copy any links as this would cause recursive duplication of the entire Glycan graph.

Parameters
  • prop_id (bool) – Whether to copy id from self to the new instance

  • fast (bool) – Whether to use the fast-path initialization process in Monosaccharide.__init__()

  • monosaccharide_type (type) – A subclass of Monosaccharide to use

Return type

Monosaccharide

Explicit Uncyclized Reducing Ends and Labels

class glypy.structure.monosaccharide.ReducedEnd(composition=None, substituents=None, valence=1, id=None)[source]

Represents the composition shift and conformation change created by reducing a Monosaccharide.

Variables
  • composition (Composition) – The elemental composition of the reducing end reduction modification.

  • links (OrderedMultiMap) – The attached substituents

  • valence (int) – Number of substituents this node can host

  • id (int) – Unique identifier

:ivar There is also a class attribute, name for comparison with aldi:

add_substituent(substituent, position=-1, max_occupancy=0, child_position=1, parent_loss=None, child_loss=None)[source]

Adds a Substituent and associated Link to substituent_links at the site given by position. This new substituent is included when calculating mass with substituents included

Parameters
  • substituent (str or Substituent) – The substituent to add. If passed a str, it will be translated into an instance of Substituent

  • position (int or 'x') – The location to add the Substituent link to substituent_links. Defaults to -1

  • child_position (int) – The location to add the link to in substituent’s links. Defaults to -1. Substituent indices are currently not checked.

  • max_occupancy (int, optional) – The maximum number of items acceptable at position. Defaults to 1

  • parent_loss (Composition or str) – The elemental composition removed from self

  • child_loss (Composition or str) – The elemental composition removed from substituent

Raises
  • IndexErrorposition exceeds the bounds set by superclass.

  • ValueErrorposition is occupied by more than max_occupancy elements

children()[source]

Returns an iterator over the nodes which are considered the descendants of self.

clone(prop_id=True)[source]

Make a deep copy of self.

Parameters

prop_id (bool) – Whether to copy over id.

Return type

ReducedEnd

drop_substituent(position, substituent=None, refund=True)[source]

Remove the substituent at position.

If substituent is None, then the first substituent found at position is removed.

Parameters
  • position (int) – The position to drop the modification from

  • substituent (Substituent) – The Substituent to remove. If None, the first substituent found at position will be removed

  • refund (bool) – Passed to break_link()

Raises
  • IndexError: – If position exceeds valence

  • ValueError: – If substituent is not found at position

Returns

self for chaining calls

Return type

ReducedEnd

is_occupied(position)[source]

Checks to see if a particular backbone position is occupied by a or Substituent.

Parameters

position (int) – The position to check for occupancy. Passing -1 checks for undetermined attachments.

Returns

The number of occupants at position, or float('inf') if position exceeds valence

Return type

numeric

mass(average=False, charge=0, mass_data=None)[source]

Calculates the total mass of self.

Parameters
  • average (bool, optional, defaults to False) – Whether or not to use the average isotopic composition when calculating masses. When average == False, masses are calculated using monoisotopic mass.

  • charge (int, optional, defaults to 0) – If charge is non-zero, m/z is calculated, where m is the theoretical mass, and z is charge

  • mass_data (dict, optional) – If mass_data is None, standard NIST mass and isotopic abundance data are used. Otherwise the contents of mass_data are assumed to contain elemental mass and isotopic abundance information. Defaults to None.

Return type

float

total_composition()[source]

Computes the sum of the composition of self and each of its linked :class:`~.substituent.Substituent`s

Return type

Composition