Glycan Compositions and Residues

GlycanComposition, MonosaccharideResidue, and SubstituentResidue are useful for working with bag-of-residues where topology and connections are not relevant, but the aggregate composition is known. These types work with a subset of the IUPAC three letter code for specifying compositions.

A Monosaccharide is meant to be able to precisely describe where all of the bonds from the carbon backbone are. A MonosaccharideResidue abstracts away the notion of position, and automatically deduct a water molecule from their composition to account for a single incoming and a single outgoing glycosidic bond. Because they do not try to completely describe the physical configuration of the molecule, MonosaccharideResidue removes information about ring type, anomericty, configuration, and optionally stem type. The level of detail discarded is customizable in the MonosaccharideResidue.from_monosaccahride() class method.

A GlycanComposition is just a bag of MonosaccharideResidue and SubstituentResidue, similar to Composition. Its keys may be either MonosaccharideResidue instances, SubstituentResidue instances or strings which can be parsed by from_iupac_lite(), and its values are integers. They may also be written to and from a string using serialize() and parse().

>>> g = GlycanComposition(Hex=3, HexNAc=2)
>>> g["Hex"]
3
>>> r = MonosaccharideResidue.from_iupac_lite("Hex")
>>> r
MonosaccharideResidue(Hex)
>>> g[r]
3
>>> import glypy
>>> abs(g.mass() - glypy.motifs["N-Glycan core basic 1"].mass()) < 1e-5
True
>>> g2 = GlycanComposition(Hex=5)
>>> g["@n-acetyl"] = -2 # Remove two n-acetyl groups from the composition
>>> abs(g.mass() - g2.mass()) < 1e-5
True

IUPAClite

IUPAClite is a dialect of IUPAC for describing monosaccharides while omitting some precise structural information from the grammar. It also includes a compact line notation for glycan compositions.

A monosaccharide is denoted using IUPAC notation, omitting ring shape, anomeric state, chirality, and modification positions (optionally). For example, a-D-Manp would be written Man, or b-D-Glcp2NAc would be written GlcNAc.

You can also use generic base types like Hex or Pen for example, to denote a six or five carbon monosaccharide. The notation is composable, so you can specify an arbitrarily modified monosaccharide, like HexNAc(S) to specify a sulfated HexNAc, using the parenthesized convention that separates substituent groups, or dHexN for a deoxy-Hexosamine.

You can also define “floating” substituent groups by prefixing their full lowercase names with an @-sign, like @sulfate for sulfate or @acetyl for an acetyl group. Lastly, it is also possible to denote an arbitrary named group using the notation #<name>#<chemical-formula>, though this should be used only when no other option is available.

See from_iupaclite() and to_iupaclite() implementations of monomer reading and writing.

A glycan composition is written as one or more <monosaccharide>:<count> occurrences separated by a “; ” (semi-colon + space), enclosed in “{ }”. See GlycanComposition.parse() and GlycanComposition.serialize() for implementation. A few examples are shown below:

IUPAClite glycan compositions
{Hex:5; HexNAc:4; Neu5Ac:1}
{Hex:5; HexNAc:4; Neu5Ac:2}
{Fuc:1; Hex:5; HexNAc:4; Neu5Ac:2}
{Fuc:2; Hex:6; HexNAc:5; Neu5Ac:1}
{Fuc:1; Hex:6; HexNAc:5; Neu5Ac:2}

Residues

class glypy.structure.glycan_composition.MonosaccharideResidue(*args, **kwargs)[source]

Represents a Monosaccharide-like object, save that it does not connect to other Monosaccharide objects and does not have properties related to topology, specifically, anomer.

A single MonosaccharideResidue has lost a water molecule from its composition, reflecting its residual nature. This is accounted for when dealing with aggreates of residues. They also have altered carbon backbone occupancies.

MonosaccharideResidue objects are hashable and comparable on their iupac_lite representation, which is given by __str__() or name().

clone(*args, **kwargs)[source]

Copies just this Monosaccharide and its Substituent objects, creating a separate instance with the same data. All mutable data structures are duplicated and distinct from the original.

Does not copy any links as this would cause recursive duplication of the entire Glycan graph.

Parameters
  • prop_id (bool) – Whether to copy id from self to the new instance

  • fast (bool) – Whether to use the fast-path initialization process in MonosaccharideResidue.__init__()

  • monosaccharide_type (type) – A subclass of MonosaccharideResidue to use

Return type

MonosaccharideResidue

copy_underivatized()[source]

Create a copy of this residue without derivatization.

Return type

MonosaccharideResidue

drop_configuration(force=False)

Drops the absolute stereochemical configuration of this monosaccharide.

Unless force is True, if resolve_special_base_type() returns a truthy value, this function will do nothing.

Parameters
  • residue (Monosaccharide) – The monosaccharide to change

  • force (bool, optional) – Whether or not to override known special case named monosaccharides

Returns

The mutated monosaccharide

Return type

Monosaccharide

drop_positions(force=False)

Drops the position classifiers from all links and modifications attached to this monosaccharide.

Unless force is True, if resolve_special_base_type() returns a truthy value, this function will do nothing.

Parameters
  • residue (Monosaccharide) – The monosaccharide to change

  • force (bool, optional) – Whether or not to override known special case named monosaccharides

Returns

The mutated monosaccharide

Return type

Monosaccharide

drop_stem(force=False)

Drops the stem, or the carbon ring stereochemical classification from this monosaccharide.

Unless force is True, if resolve_special_base_type() returns a truthy value, this function will do nothing.

Parameters
  • residue (Monosaccharide) – The monosaccharide to change

  • force (bool, optional) – Whether or not to override known special case named monosaccharides

Returns

The mutated monosaccharide

Return type

Monosaccharide

classmethod from_monosaccharide(monosaccharide, configuration=False, stem=True, ring=False)[source]

Construct an instance of MonosaccharideResidue from an instance of Monosaccharide. This function attempts to preserve derivatization if possible.

This function will create a deep copy of monosaccharide.

Parameters
  • monosaccharide (Monosaccharide) – The monosaccharide to be converted

  • configuration (bool, optional) – Whether or not to preserve Configuration. Defaults to False

  • stem (bool, optional) – Whether or not to preserve Stem. Defaults to True

  • ring (bool, optional) – Whether or not to preserve RingType. Defaults to False

Return type

MonosaccharideResidue

name()[source]

Name this object according to iupac_lite.

Return type

str

See also

to_iupac_lite()

open_attachment_sites(max_occupancy=0)[source]

When attaching Monosaccharide instances to other objects, bonds are formed between the carbohydrate backbone and the other object. If a site is already bound, the occupying object fills that space on the backbone and prevents other objects from binding there.

Currently only cares about the availability of the hydroxyl group. As there is not a hydroxyl attached to the ring-ending carbon, that should not be considered an open site.

If any existing attached units have unknown positions, we can’t provide any known positions, in which case the list of open positions will be a list of -1 s of the length of open sites.

A MonosaccharideResidue has two fewer open attachment sites than the equivalent Monosaccharide

Parameters

max_occupancy (int) – The number of objects that may already be bound at a site before it is considered unavailable for attachment.

Returns

  • list – The positions open for binding

  • int – The number of bound but unknown locations on the backbone.

residue_name()[source]

Name this object according to iupac_lite, omitting any derivatization

Return type

str

Frozen Residues

MonosaccharideResidue operations may require str conversions which can be expensive. Instead, use FrozenMonosaccharideResidue, which once created is immutable, and substantially faster.

class glypy.structure.glycan_composition.FrozenMonosaccharideResidue(*args, **kwargs)[source]

A subclass of MonosaccharideResidue which caches the result of to_iupac_lite() and instances returned by FrozenMonosaccharideResidue.clone() and FrozenMonosaccharideResidue.from_iupac_lite(). Also treated as immutable after initialization through FrozenMonosaccharideResidue.from_monosaccharide().

Note that directly calling FrozenMonosaccharideResidue.from_monosaccharide() will not retrieve instances from the cache directly, and direct initialization using normal instance creation will neither touch the cache nor freeze the instance.

This type is intended for use with FrozenGlycanComposition to minimize the number of times from_iupac_lite() is called.

clone(*args, **kwargs)[source]

Copies just this Monosaccharide and its |Substituent|s, creating a separate instance with the same data. All mutable data structures are duplicated and distinct from the original.

Does not copy any links as this would cause recursive duplication of the entire Glycan graph.

Parameters
  • prop_id (bool) – Whether to copy id from self to the new instance

  • fast (bool) – Whether to use the fast-path initialization process in Monosaccharide.__init__()

  • monosaccharide_type (type) – A subclass of Monosaccharide to use

Return type

Monosaccharide

classmethod from_iupac_lite(string)[source]

Parse a string of iupac_lite notation to produce a residue object

Parameters

string (str) – The string to parse

Return type

ResidueBase

classmethod from_monosaccharide(monosaccharide, *args, **kwargs)[source]

Construct an instance of MonosaccharideResidue from an instance of Monosaccharide. This function attempts to preserve derivatization if possible.

This function will create a deep copy of monosaccharide.

Parameters
  • monosaccharide (Monosaccharide) – The monosaccharide to be converted

  • configuration (bool, optional) – Whether or not to preserve Configuration. Defaults to False

  • stem (bool, optional) – Whether or not to preserve Stem. Defaults to True

  • ring (bool, optional) – Whether or not to preserve RingType. Defaults to False

Return type

MonosaccharideResidue

mass(average=False, charge=0, mass_data=None, substituents=True)[source]

Calculates the total mass of self.

Parameters
  • average (bool, optional, defaults to False) – Whether or not to use the average isotopic composition when calculating masses. When average == False, masses are calculated using monoisotopic mass.

  • charge (int, optional, defaults to 0) – If charge is non-zero, m/z is calculated, where m is the theoretical mass, and z is charge

  • mass_data (dict, optional) – If mass_data is None, standard NIST mass and isotopic abundance data are used. Otherwise the contents of mass_data are assumed to contain elemental mass and isotopic abundance information. Defaults to None.

  • substituents (bool, optional, defaults to True) – Whether or not to include substituents’ masses.

Return type

float

total_composition()[source]

Computes the sum of the composition of self and each of its linked Substituents

Return type

Composition

Substituent Residues

class glypy.structure.glycan_composition.SubstituentResidue(name, composition=None, id=None, links=None, can_nh_derivatize=None, is_nh_derivatizable=None, derivatize=False, attachment_composition=None)[source]

Represent substituent molecules unassociated with a specific monosaccharide residue.

Note

SubstituentResidue’s composition value includes the losses for forming a bond between a monosaccharide residue and the substituent.

Variables
classmethod from_iupac_lite(name)[source]

Parse a string of iupac_lite notation to produce a residue object

Parameters

string (str) – The string to parse

Return type

ResidueBase

sigil = '@'

All substituent string identifiers are prefixed with this character for the from_iupac_lite() parser

to_iupac_lite()[source]

Encode this residue using iupac_lite notation.

Return type

str

Glycan Composition

class glypy.structure.glycan_composition.GlycanComposition(*args, **kwargs)[source]

Describe a glycan as a collection of MonosaccharideResidue counts without explicit linkage information relating how each monosaccharide is connected to its neighbors.

This class subclasses dict, and assumes that keys will either be MonosaccharideResidue instances, SubstituentResidue instances, or strings in iupac_lite format which will be parsed into one of these types. While other types may be used, this is not recommended. All standard dict methods are supported.

GlycanComposition objects may be derivatized just as Glycan objects are, with glypy.composition.composition_transform.derivatize() and glypy.composition.composition_transform.strip_derivatization().

GlycanComposition objects also support composition arithmetic, and can be added or subtracted from each other or multiplied by an integer.

As GlycanComposition is not a complete structure, they cannot be translated into text formats as full Glycan objects are. They may instead be converted to and from a short-form text notation using GlycanComposition.serialize() and reconstructed from this format using GlycanComposition.parse().

Variables
  • reducing_end (ReducedEnd) – Describe the reducing end of the aggregate without binding it to a specific monosaccharide. This will contribute to composition and mass calculations.

  • _composition_offset (CComposition) – Account for the one water molecule’s worth of composition left over from applying the “residue” transformation to each monosaccharide in the aggregate.

__init__(*args, **kwargs)[source]

Initialize a GlycanComposition using the provided objects or keyword arguments, imitating the dict initialization signature.

If a Mapping is provided as a positional argument, it will be used as a template. If arbitrary keyword arguments are provided, they will be interpreted using update(). As a special case, if another GlycanComposition is provided, its reducing_end attribute will also be copied.

Parameters
  • *args – Arbitrary positional arguments

  • **kwargs – Arbitrary keyword arguments

collapse()[source]

Merge redundant keys.

After performing a structure-detail removing operation like drop_positions(), drop_configurations(), or drop_stems(), monosaccharide keys may be redundant.

collapse will merge keys which refer to the same type of molecule.

copy() a shallow copy of D[source]
classmethod from_glycan(glycan)[source]

Convert a Glycan into a GlycanComposition.

Parameters

glycan (Glycan) – The instance to be converted

Return type

GlycanComposition

mass(average=False, charge=0, mass_data=None)[source]

Calculates the total mass of self.

Note

The monoisotopic mass is cached on first computation in _mass.

Parameters
  • average (bool, optional, defaults to False) – Whether or not to use the average isotopic composition when calculating masses. When average == False, masses are calculated using monoisotopic mass.

  • charge (int, optional, defaults to 0) – If charge is non-zero, m/z is calculated, where m is the theoretical mass, and z is charge

  • mass_data (dict, optional) – If mass_data is None, standard NIST mass and isotopic abundance data are used. Otherwise the contents of mass_data are assumed to contain elemental mass and isotopic abundance information. Defaults to None.

Return type

float

classmethod parse(string)[source]

Parse a str into a GlycanComposition.

This will parse the format produced by serialize()

Parameters

string (str) – The string to parse

Return type

GlycanComposition

query(query, exact=True, **kwargs)[source]

Return the total count of all residues in self which match query using glypy.io.nomenclature.identity.is_a()

Parameters
Returns

The total count of all residues which satisfy the is-a relationship

Return type

int

reinterpret(references, exact=True, **kwargs)[source]

Aggregate the counts of all residues in self for each monosaccharide in references satisfying an is-a relationship, collapsing multiple residues to a single key. Any residue not aggregated will be preserved as-is.

Note

The order of references matters as any residue matched by a reference will not be considered for later references.

Parameters
  • references (Iterable of MonosaccharideResidue) – The monosaccharides with which to test for an is-a relationship

  • exact (bool, optional) – Passed to is_a(). Explicitly True by default

  • **kwargs – Passed to is_a()

Returns

self after key collection and collapse

Return type

GlycanComposition

total_composition()[source]

Computes the sum of the composition of all Monosaccharide objects in self

Return type

Composition

update([E, ]**F) None.  Update D from dict/iterable E and F.[source]

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

Frozen Composition

GlycanComposition objects automatically convert str arguments to MonosaccharideResidue instances, which as previously mentioned, can be slow. If key objects will not be modified, the FrozenGlycanComposition is considerably faster for all operations. If both the keys themselves and the values will not be modified after creation, the HashableGlycanComposition is also useful and hashable.

class glypy.structure.glycan_composition.FrozenGlycanComposition(*args, **kwargs)[source]

A subclass of GlycanComposition which uses FrozenMonosaccharideResidue instead of MonosaccharideResidue which reduces the number of times from_iupac_lite() is called.

Only use this type if residue names are pre-validated, residue types will not be transformed, and when creating many, many instances. from_iupac_lite() invokes expensive introspection algorithms which can be costly when repeatedly manipulating the same residue types.

classmethod parse(string)[source]

Parse a str into a GlycanComposition.

This will parse the format produced by serialize()

Parameters

string (str) – The string to parse

Return type

GlycanComposition

thaw()[source]

Convert this FrozenGlycanComposition into a GlycanComposition that is not frozen.

Return type

GlycanComposition

class glypy.structure.glycan_composition.HashableGlycanComposition(*args, **kwargs)[source]

IUPAClite

glypy.structure.glycan_composition.to_iupac_lite(residue)
glypy.structure.glycan_composition.from_iupac_lite(string, residue_class=None)