GlycoCT

A parser for GlycoCT{condensed} format.

GlycoCT{condensed} is a multi-line format for representing glycan structures and compositions published in [1]. The format is intended to be human-readable, easily compressed, and includes a canonicalization algorithm to ensure that there is only a single representation for a glycan structure.

GlycoCT{condensed} can represent glycan structures with ambiguous or repeating sub-units. The specification includes additional section directives with support for stochastic sub-units as well as disjoint subgraphs, though these have not been implemented in glypy.

References

[1] Herget, S., Ranzinger, R., Maass, K., & Lieth, C.-W. V. D. (2008).

GlycoCT-a unifying sequence format for carbohydrates. Carbohydrate Research, 343(12), 2162–2171. https://doi.org/10.1016/j.carres.2008.03.011

High Level Functions

glypy.io.glycoct.dump(structure, buffer=None)[source]

Serialize the Glycan into GlycoCT{condensed}, using buffer to store the result. If buffer is None, then the function will operate on a newly created StringIO object.

Parameters
  • structure (Glycan) – The structure to serialize

  • buffer (file-like or None) – The stream to write the serialized structure to. If None, uses an instance of StringIO

Return type

file-like or str if buffer is None

glypy.io.glycoct.load(stream, structure_class=<class 'glypy.structure.glycan.Glycan'>, allow_repeats=True, allow_multiple=True)[source]

Read all structures from the provided text stream.

Parameters
  • stream (file-like) – The text stream to parse structures from

  • structure_class (type, optional) – Glycan subclass to use

  • allow_repeats (bool, optional) – Whether or not to allow REP sections

Return type

Glycan or list of Glycan

glypy.io.glycoct.dumps(structure)[source]

Serialize the Glycan into GlycoCT{condensed}, returning the text as a string.

Parameters

structure (Glycan) – The structure to serialize

Return type

str

glypy.io.glycoct.loads(text, structure_class=<class 'glypy.structure.glycan.Glycan'>, allow_repeats=True, allow_multiple=True)[source]

Read all structures from the provided text string.

Parameters
  • text (str) – The text to parse structures from

  • structure_class (type, optional) – Glycan subclass to use

  • allow_repeats (bool, optional) – Whether or not to allow REP sections

Return type

Glycan or list of Glycan

exception glypy.io.glycoct.GlycoCTError[source]

Base error for GlycoCT-based parsing exceptions.

Examples

>>> from glypy.io import glycoct
>>> glycoct.loads("""RES
1b:x-dglc-HEX-1:5
2s:n-acetyl
3b:b-dglc-HEX-1:5
4s:n-acetyl
5b:b-dman-HEX-1:5
6b:a-dman-HEX-1:5
7b:b-dglc-HEX-1:5
8s:n-acetyl
9b:a-lgal-HEX-1:5|6:d
10b:b-dgal-HEX-1:5
11b:a-dgro-dgal-NON-2:6|1:a|2:keto|3:d
12s:n-glycolyl
13b:b-dglc-HEX-1:5
14s:n-acetyl
15b:b-dgal-HEX-1:5
16s:n-acetyl
17b:b-dglc-HEX-1:5
18s:n-acetyl
19b:a-dman-HEX-1:5
20b:b-dglc-HEX-1:5
21s:n-acetyl
22b:a-lgal-HEX-1:5|6:d
23b:b-dgal-HEX-1:5
24b:a-dgro-dgal-NON-2:6|1:a|2:keto|3:d
25s:n-glycolyl
26b:b-dglc-HEX-1:5
27s:n-acetyl
28b:a-lgal-HEX-1:5|6:d
29b:b-dgal-HEX-1:5
30b:a-dgro-dgal-NON-2:6|1:a|2:keto|3:d
31s:n-acetyl
32b:a-lgal-HEX-1:5|6:d
LIN
1:1d(2+1)2n
2:1o(4+1)3d
3:3d(2+1)4n
4:3o(4+1)5d
5:5o(3+1)6d
6:6o(2+1)7d
7:7d(2+1)8n
8:7o(3+1)9d
9:7o(4+1)10d
10:10o(3+2)11d
11:11d(5+1)12n
12:6o(4+1)13d
13:13d(2+1)14n
14:13o(4+1)15d
15:15d(2+1)16n
16:5o(4+1)17d
17:17d(2+1)18n
18:5o(6+1)19d
19:19o(2+1)20d
20:20d(2+1)21n
21:20o(3+1)22d
22:20o(4+1)23d
23:23o(3+2)24d
24:24d(5+1)25n
25:19o(6+1)26d
26:26d(2+1)27n
27:26o(3+1)28d
28:26o(4+1)29d
29:29o(3+2)30d
30:30d(5+1)31n
31:1o(6+1)32d
""")
>>>

(Source code, svg, png, hires.png, pdf)

../_images/glycoct-1.svg

Object-Oriented Interface

class glypy.io.glycoct.GlycoCTReader(stream, structure_class=<class 'glypy.structure.glycan.Glycan'>, allow_repeats=True, completes=True)[source]

Parse GlycoCT{condensed} text data into Glycan objects.

The parser implements the Iterator interface, yielding successive glycans from a text stream separated by empty lines.

The parser can understand fully specified and partially ambiguous structures. When allow_repeats is True and a REP section is encountered, it will be expanded to its minimum multiplicity, or 1 if the minimum is unknown. UND sections will be connected to the main graph by AmbiguousLink instead of Link objects.

Variables
  • allow_repeats (bool) – Whether or not to permit REP sections. Defaults to True

  • completes (bool) – Whether or not to translate the built graph into a Glycan object. Defaults to True

  • handle (file-like) – The text file being read from

  • in_repeat (bool) – Indicates the parser is currently parsing a REP section’s sub-graph

  • in_undetermined (bool) – Indicates the parser is currently parsing a UND section’s sub-graph

  • postponed (list) – Holds all the deferred operations for the top-most graph as callable objects

  • root (Monosaccharide) – The root node of the produced graph

  • state (str) – The current state of the parser’s state machine

  • structure_class (type) – The Glycan sub-class to produce

  • repeats (dict) – Maps RES section index to RepeatedGlycoCTSubgraph

  • undetermineds (dict) – Maps UND section index to UndeterminedGlycoCTSubgraph

glypy.io.glycoct.GlycoCTWriter

alias of UNDOrderRespectingGlycoCTWriter

Implementation Details

class glypy.io.glycoct.RepeatedGlycoCTSubgraph(graph_index, repeat_index, internal_linkage=None, external_linkage=None, multitude=None, graph=None, parent=None)[source]

Implements the machinery for representing a repeated subgraph in GlycoCT.

Variables
  • graph_index (int) –

  • repeast_index (int) – The ``i``th repeating subgraph in the graph.

  • internal_linkage (object) – The linkage connecting two repetitions of the subgraph

  • external_linkage (object) – The linkage connecting from the final repetition and the outside nodes.

  • multitude (RepeatedMultitude) – Holds the lower and upper range of multiplicities this subgraph may be repeated to.

  • repetitions (OrderedDict) – The repetitions of this subgraph, materialized during postprocess()

  • postponed (deque) – A queue of post-processing callbacks.

class glypy.io.glycoct.UndeterminedGlycoCTSubgraph(und_index, probability=None, parent_ids=None, subtree_linkages=None, graph=None, parent=None)[source]