Glycan Compositions and Residues¶
GlycanComposition
, MonosaccharideResidue
, and SubstituentResidue
are
useful for working with bag-of-residues where topology and connections are not relevant, but
the aggregate composition is known. These types work with a subset of the IUPAC three letter code
for specifying compositions.
A Monosaccharide
is meant to be able to precisely describe where all of the bonds from
the carbon backbone are. A MonosaccharideResidue
abstracts away the notion of
position, and automatically deduct a water molecule from their composition
to
account for a single incoming and a single outgoing glycosidic bond. Because they do not try to
completely describe the physical configuration of the molecule, MonosaccharideResidue
removes information about ring type, anomericty, configuration, and optionally stem type. The level
of detail discarded is customizable in the MonosaccharideResidue.from_monosaccahride()
class method.
A GlycanComposition
is just a bag of MonosaccharideResidue
and SubstituentResidue
,
similar to Composition
. Its keys may be either MonosaccharideResidue
instances,
SubstituentResidue
instances or strings which can be parsed by from_iupac_lite()
, and its values
are integers. They may also be written to and from a string using serialize()
and
parse()
.
>>> g = GlycanComposition(Hex=3, HexNAc=2)
>>> g["Hex"]
3
>>> r = MonosaccharideResidue.from_iupac_lite("Hex")
>>> r
MonosaccharideResidue(Hex)
>>> g[r]
3
>>> import glypy
>>> abs(g.mass() - glypy.motifs["N-Glycan core basic 1"].mass()) < 1e-5
True
>>> g2 = GlycanComposition(Hex=5)
>>> g["@n-acetyl"] = -2 # Remove two n-acetyl groups from the composition
>>> abs(g.mass() - g2.mass()) < 1e-5
True
IUPAClite¶
IUPAClite is a dialect of IUPAC for describing monosaccharides while omitting some precise structural information from the grammar. It also includes a compact line notation for glycan compositions.
A monosaccharide is denoted using IUPAC notation, omitting ring shape, anomeric state, chirality,
and modification positions (optionally). For example, a-D-Manp
would be written Man
, or
b-D-Glcp2NAc
would be written GlcNAc
.
You can also use generic base types like Hex
or Pen
for example, to denote a six or five
carbon monosaccharide. The notation is composable, so you can specify an arbitrarily modified
monosaccharide, like HexNAc(S)
to specify a sulfated HexNAc, using the parenthesized convention
that separates substituent groups, or dHexN
for a deoxy-Hexosamine.
You can also define “floating” substituent groups by prefixing their full lowercase
names with an @
-sign, like @sulfate
for sulfate or @acetyl
for an acetyl group. Lastly,
it is also possible to denote an arbitrary named group using the notation #<name>#<chemical-formula>
,
though this should be used only when no other option is available.
See from_iupaclite()
and to_iupaclite()
implementations of monomer reading and writing.
A glycan composition is written as one or more <monosaccharide>:<count>
occurrences separated by
a “; ” (semi-colon + space), enclosed in “{ }”. See GlycanComposition.parse()
and
GlycanComposition.serialize()
for implementation. A few examples are shown below:
{Hex:5; HexNAc:4; Neu5Ac:1}
{Hex:5; HexNAc:4; Neu5Ac:2}
{Fuc:1; Hex:5; HexNAc:4; Neu5Ac:2}
{Fuc:2; Hex:6; HexNAc:5; Neu5Ac:1}
{Fuc:1; Hex:6; HexNAc:5; Neu5Ac:2}
Residues¶
- class glypy.structure.glycan_composition.MonosaccharideResidue(*args, **kwargs)[source]¶
Represents a
Monosaccharide
-like object, save that it does not connect to otherMonosaccharide
objects and does not have properties related to topology, specifically,anomer
.A single
MonosaccharideResidue
has lost a water molecule from its composition, reflecting its residual nature. This is accounted for when dealing with aggreates of residues. They also have altered carbon backbone occupancies.MonosaccharideResidue
objects are hashable and comparable on theiriupac_lite
representation, which is given by__str__()
orname()
.- clone(*args, **kwargs)[source]¶
Copies just this
Monosaccharide
and itsSubstituent
objects, creating a separate instance with the same data. All mutable data structures are duplicated and distinct from the original.Does not copy any
links
as this would cause recursive duplication of the entireGlycan
graph.- Parameters
prop_id (
bool
) – Whether to copyid
fromself
to the new instancefast (
bool
) – Whether to use the fast-path initialization process inMonosaccharideResidue.__init__()
monosaccharide_type (
type
) – A subclass ofMonosaccharideResidue
to use
- Return type
- drop_configuration(force=False)¶
Drops the absolute stereochemical configuration of this monosaccharide.
Unless
force
isTrue
, ifresolve_special_base_type()
returns a truthy value, this function will do nothing.- Parameters
residue (
Monosaccharide
) – The monosaccharide to changeforce (bool, optional) – Whether or not to override known special case named monosaccharides
- Returns
The mutated monosaccharide
- Return type
- drop_positions(force=False)¶
Drops the position classifiers from all links and modifications attached to this monosaccharide.
Unless
force
isTrue
, ifresolve_special_base_type()
returns a truthy value, this function will do nothing.- Parameters
residue (
Monosaccharide
) – The monosaccharide to changeforce (bool, optional) – Whether or not to override known special case named monosaccharides
- Returns
The mutated monosaccharide
- Return type
- drop_stem(force=False)¶
Drops the stem, or the carbon ring stereochemical classification from this monosaccharide.
Unless
force
isTrue
, ifresolve_special_base_type()
returns a truthy value, this function will do nothing.- Parameters
residue (
Monosaccharide
) – The monosaccharide to changeforce (bool, optional) – Whether or not to override known special case named monosaccharides
- Returns
The mutated monosaccharide
- Return type
- classmethod from_monosaccharide(monosaccharide, configuration=False, stem=True, ring=False)[source]¶
Construct an instance of
MonosaccharideResidue
from an instance ofMonosaccharide
. This function attempts to preserve derivatization if possible.This function will create a deep copy of
monosaccharide
.- Parameters
monosaccharide (Monosaccharide) – The monosaccharide to be converted
configuration (bool, optional) – Whether or not to preserve
Configuration
. Defaults toFalse
stem (bool, optional) – Whether or not to preserve
Stem
. Defaults toTrue
ring (bool, optional) – Whether or not to preserve
RingType
. Defaults toFalse
- Return type
- open_attachment_sites(max_occupancy=0)[source]¶
When attaching
Monosaccharide
instances to other objects, bonds are formed between the carbohydrate backbone and the other object. If a site is already bound, the occupying object fills that space on the backbone and prevents other objects from binding there.Currently only cares about the availability of the hydroxyl group. As there is not a hydroxyl attached to the ring-ending carbon, that should not be considered an open site.
If any existing attached units have unknown positions, we can’t provide any known positions, in which case the list of open positions will be a
list
of-1
s of the length of open sites.A
MonosaccharideResidue
has two fewer open attachment sites than the equivalentMonosaccharide
Frozen Residues¶
MonosaccharideResidue
operations may require str
conversions which can be expensive.
Instead, use FrozenMonosaccharideResidue
, which once created is immutable, and substantially faster.
- class glypy.structure.glycan_composition.FrozenMonosaccharideResidue(*args, **kwargs)[source]¶
A subclass of
MonosaccharideResidue
which caches the result ofto_iupac_lite()
and instances returned byFrozenMonosaccharideResidue.clone()
andFrozenMonosaccharideResidue.from_iupac_lite()
. Also treated as immutable after initialization throughFrozenMonosaccharideResidue.from_monosaccharide()
.Note that directly calling
FrozenMonosaccharideResidue.from_monosaccharide()
will not retrieve instances from the cache directly, and direct initialization using normal instance creation will neither touch the cache nor freeze the instance.This type is intended for use with
FrozenGlycanComposition
to minimize the number of timesfrom_iupac_lite()
is called.- clone(*args, **kwargs)[source]¶
Copies just this
Monosaccharide
and its |Substituent|s, creating a separate instance with the same data. All mutable data structures are duplicated and distinct from the original.Does not copy any
links
as this would cause recursive duplication of the entireGlycan
graph.
- classmethod from_iupac_lite(string)[source]¶
Parse a string of
iupac_lite
notation to produce a residue object- Parameters
string (
str
) – The string to parse- Return type
ResidueBase
- classmethod from_monosaccharide(monosaccharide, *args, **kwargs)[source]¶
Construct an instance of
MonosaccharideResidue
from an instance ofMonosaccharide
. This function attempts to preserve derivatization if possible.This function will create a deep copy of
monosaccharide
.- Parameters
monosaccharide (Monosaccharide) – The monosaccharide to be converted
configuration (bool, optional) – Whether or not to preserve
Configuration
. Defaults toFalse
stem (bool, optional) – Whether or not to preserve
Stem
. Defaults toTrue
ring (bool, optional) – Whether or not to preserve
RingType
. Defaults toFalse
- Return type
- mass(average=False, charge=0, mass_data=None, substituents=True)[source]¶
Calculates the total mass of
self
.- Parameters
average (bool, optional, defaults to False) – Whether or not to use the average isotopic composition when calculating masses. When
average == False
, masses are calculated using monoisotopic mass.charge (int, optional, defaults to 0) – If charge is non-zero, m/z is calculated, where m is the theoretical mass, and z is
charge
mass_data (dict, optional) – If mass_data is None, standard NIST mass and isotopic abundance data are used. Otherwise the contents of mass_data are assumed to contain elemental mass and isotopic abundance information. Defaults to
None
.substituents (bool, optional, defaults to True) – Whether or not to include substituents’ masses.
- Return type
- total_composition()[source]¶
Computes the sum of the composition of
self
and each of its linkedSubstituent
s- Return type
Composition
Substituent Residues¶
- class glypy.structure.glycan_composition.SubstituentResidue(name, composition=None, id=None, links=None, can_nh_derivatize=None, is_nh_derivatizable=None, derivatize=False, attachment_composition=None)[source]¶
Represent substituent molecules unassociated with a specific monosaccharide residue.
Note
SubstituentResidue
’s composition value includes the losses for forming a bond between a monosaccharide residue and the substituent.- Variables
name (str) – As in
Substituent
, but withSubstituentResidue.sigil
prepended.composition (
Composition
) –links (
OrderedMultiMap
) –_order (
int
) –
- classmethod from_iupac_lite(name)[source]¶
Parse a string of
iupac_lite
notation to produce a residue object- Parameters
string (
str
) – The string to parse- Return type
ResidueBase
- sigil = '@'¶
All substituent string identifiers are prefixed with this character for the
from_iupac_lite()
parser
Glycan Composition¶
- class glypy.structure.glycan_composition.GlycanComposition(*args, **kwargs)[source]¶
Describe a glycan as a collection of
MonosaccharideResidue
counts without explicit linkage information relating how each monosaccharide is connected to its neighbors.This class subclasses
dict
, and assumes that keys will either beMonosaccharideResidue
instances,SubstituentResidue
instances, or strings iniupac_lite
format which will be parsed into one of these types. While other types may be used, this is not recommended. All standarddict
methods are supported.GlycanComposition
objects may be derivatized just asGlycan
objects are, withglypy.composition.composition_transform.derivatize()
andglypy.composition.composition_transform.strip_derivatization()
.GlycanComposition objects also support composition arithmetic, and can be added or subtracted from each other or multiplied by an integer.
As GlycanComposition is not a complete structure, they cannot be translated into text formats as full
Glycan
objects are. They may instead be converted to and from a short-form text notation usingGlycanComposition.serialize()
and reconstructed from this format usingGlycanComposition.parse()
.- Variables
reducing_end (ReducedEnd) – Describe the reducing end of the aggregate without binding it to a specific monosaccharide. This will contribute to composition and mass calculations.
_composition_offset (CComposition) – Account for the one water molecule’s worth of composition left over from applying the “residue” transformation to each monosaccharide in the aggregate.
- __init__(*args, **kwargs)[source]¶
Initialize a
GlycanComposition
using the provided objects or keyword arguments, imitating thedict
initialization signature.If a
Mapping
is provided as a positional argument, it will be used as a template. If arbitrary keyword arguments are provided, they will be interpreted usingupdate()
. As a special case, if anotherGlycanComposition
is provided, itsreducing_end
attribute will also be copied.- Parameters
*args – Arbitrary positional arguments
**kwargs – Arbitrary keyword arguments
- collapse()[source]¶
Merge redundant keys.
After performing a structure-detail removing operation like
drop_positions()
,drop_configurations()
, ordrop_stems()
, monosaccharide keys may be redundant.collapse
will merge keys which refer to the same type of molecule.
- classmethod from_glycan(glycan)[source]¶
Convert a
Glycan
into aGlycanComposition
.- Parameters
glycan (
Glycan
) – The instance to be converted- Return type
- mass(average=False, charge=0, mass_data=None)[source]¶
Calculates the total mass of
self
.Note
The monoisotopic mass is cached on first computation in
_mass
.- Parameters
average (bool, optional, defaults to False) – Whether or not to use the average isotopic composition when calculating masses. When
average == False
, masses are calculated using monoisotopic mass.charge (int, optional, defaults to 0) – If charge is non-zero, m/z is calculated, where m is the theoretical mass, and z is
charge
mass_data (dict, optional) – If mass_data is
None
, standard NIST mass and isotopic abundance data are used. Otherwise the contents of mass_data are assumed to contain elemental mass and isotopic abundance information. Defaults toNone
.
- Return type
- classmethod parse(string)[source]¶
Parse a
str
into aGlycanComposition
.This will parse the format produced by
serialize()
- Parameters
string (
str
) – The string to parse- Return type
- query(query, exact=True, **kwargs)[source]¶
Return the total count of all residues in
self
which matchquery
usingglypy.io.nomenclature.identity.is_a()
- Parameters
query (
MonosaccharideResidue
orstr
) – A monosaccharide residue or a string which will be converted into one byfrom_iupac_lite()
to test for anis-a
relationship with.exact (bool, optional) – Passed to
is_a()
. ExplicitlyTrue
by default**kwargs – Passed to
is_a()
- Returns
The total count of all residues which satisfy the
is-a
relationship- Return type
- reinterpret(references, exact=True, **kwargs)[source]¶
Aggregate the counts of all residues in
self
for each monosaccharide inreferences
satisfying anis-a
relationship, collapsing multiple residues to a single key. Any residue not aggregated will be preserved as-is.Note
The order of
references
matters as any residue matched by a reference will not be considered for later references.- Parameters
references (
Iterable
ofMonosaccharideResidue
) – The monosaccharides with which to test for anis-a
relationshipexact (bool, optional) – Passed to
is_a()
. ExplicitlyTrue
by default**kwargs – Passed to
is_a()
- Returns
self after key collection and collapse
- Return type
- total_composition()[source]¶
Computes the sum of the composition of all
Monosaccharide
objects inself
- Return type
Composition
Frozen Composition¶
GlycanComposition
objects automatically convert str
arguments to MonosaccharideResidue
instances, which as previously mentioned, can be slow. If key objects will not be modified, the
FrozenGlycanComposition
is considerably faster for all operations. If both the keys themselves and the
values will not be modified after creation, the HashableGlycanComposition
is also useful and hashable.
- class glypy.structure.glycan_composition.FrozenGlycanComposition(*args, **kwargs)[source]¶
A subclass of
GlycanComposition
which usesFrozenMonosaccharideResidue
instead ofMonosaccharideResidue
which reduces the number of timesfrom_iupac_lite()
is called.Only use this type if residue names are pre-validated, residue types will not be transformed, and when creating many, many instances.
from_iupac_lite()
invokes expensive introspection algorithms which can be costly when repeatedly manipulating the same residue types.- classmethod parse(string)[source]¶
Parse a
str
into aGlycanComposition
.This will parse the format produced by
serialize()
- Parameters
string (
str
) – The string to parse- Return type
- thaw()[source]¶
Convert this
FrozenGlycanComposition
into aGlycanComposition
that is not frozen.- Return type
IUPAClite¶
- glypy.structure.glycan_composition.to_iupac_lite(residue)¶
- glypy.structure.glycan_composition.from_iupac_lite(string, residue_class=None)¶