Heuristic Similarity

A collection of routines for doing fuzzy matching of monosaccharides, as well as a set of predicates for classifying common properties of monosaccharides.

Core Heuristic

glypy.algorithms.similarity.monosaccharide_similarity(node, target, include_substituents=True, include_modifications=True, include_children=False, exact=True, ignore_reduction=False, ignore_ring=False, treat_null_as_wild=True, match_attachement_positions=False, short_circuit_after=None, visited=None)

A heuristic comparison for measuring similarity between monosaccharides.

Compares:
  1. ring_start and ring_end

  2. superclass

  3. configuration

  4. stem

  5. anomer

  6. If include_modifications, each modification

  7. If include_substituents, each substituent

  8. If include_children, each child Monosaccharide

The result is two numbers, the observed similarity between node and target, and the similarity between target and itself. expected - observed is there number of differences observed between the two monosaccharides, which can be useful for expressing how far apart two monosaccharides are in feature space. For more distant similarity testing, especially when considering children, the ratio observed / expected might be used instead.

Similarity is not symmetric, e.g. a -> b != b -> a. A commutative version of similarity can be used by calculating both directions, and taking the result with the smallest error.

Parameters
  • node (Monosaccharide) – The reference monosaccharide

  • target (Monosaccharide) – The monosaccharide to compare against

  • include_substituents (bool) – Include substituents in comparison (Defaults True)

  • include_modifications (bool) – Include modifications in comparison (Defaults True)

  • include_children (bool) – Include children in comparison (Defaults False)

  • exact (bool) – Penalize for having unmatched attachments (Defaults True)

  • ignore_reduction (bool) – Whether or not to include differences in reduction state as a mismatch

  • ignore_ring (bool) – Whether or not to include differences in ring coordinates as a mismatch

  • treat_null_as_wild (bool) – Whether or not to treat traits with a value of None or UnknownPosition as always matching when the null value is on the target residue (the residue that traits are being matched to).

  • short_circuit_after (None or Number) – Controls whether to quit comparing nodes if the difference becomes too large, useful for speeding up pessimistic comparisons

  • visited (set) – Tracks which node pairs have already been compared to break cycles. This carries state across multiple calls to compare() and must be reset by calling reset() before reusing an instance on new structures.

Returns

  • :class:`int` (observed) – The number of observed features that matched

  • :class:`int` (expected) – The number of features that could have been matched

Commutative Options

glypy.algorithms.similarity.commutative_similarity(node, target, tolerance=0, *args, **kwargs)[source]

Apply monosaccharide_similarity() to node and target for both node --> target and target --> node, returning whether either comparison passes the tolerance threshold.

Parameters
Return type

bool

glypy.algorithms.similarity.commutative_similarity_score(node, target, *args, **kwargs)[source]

Apply monosaccharide_similarity() to node and target for both node --> target and target --> node, returning the maximally normalized ratio of observed / expected.

Parameters
Returns

The maximal similarity score ratio

Return type

float

glypy.algorithms.similarity.commutative_similarity_score_with_tolerance(node, target, tolerance, *args, **kwargs)[source]

Apply monosaccharide_similarity() to node and target for both node --> target and target --> node, returning the maximally normalized ratio score, and whether there was a pair error less than tolerance.

This can be viewed as a combination of commutative_similarity() and commutative_similarity_score() while making fewer calls to monosaccharide_similarity().

Parameters
Returns

  • float – The maximal similarity score ratio

  • bool – Whether the difference passes error tolerance

Predicates

glypy.algorithms.similarity.has_substituent(monosaccharide, substituent)[source]

Checks whether monosaccharide has any substituent groups matching substituent.

Parameters
Return type

bool

glypy.algorithms.similarity.has_modification(monosaccharide, modification)[source]

Checks whether monosaccharide has any modification sites matching modification.

Parameters
Return type

bool

glypy.algorithms.similarity.has_monosaccharide(glycan, monosaccharide, tolerance=0, *args, **kwargs)[source]

Checks whether glycan has any monosaccharide nodes matching monosaccharide within tolerance using commutative_similarity()

Parameters
Return type

bool

glypy.algorithms.similarity.is_reduced(obj)[source]

A simple predicate to test whether an object has a reduced structure.

If obj does not have a reducing_end attribute, this will return False

Parameters

obj (object) – The object to check

Return type

bool

glypy.algorithms.similarity.is_amine(substituent)[source]

A simple predicate to test whether a substituent has an amine group adjacent attached to the carbon backbone by naming convention.

This predicate checks to see if the name of the substituent is “amino” or if it starts with the phrase “n_” only.

Parameters

substituent (Substituent or str) – The object to test

Return type

bool

glypy.algorithms.similarity.is_aminated(monosaccharide)[source]

Tests to see if any substituents of monosaccharide are amines.

Each substituent is tested using is_amine(), with all the caveats that entails.

Parameters

monosaccharide (Monosaccharide) – The monosaccharide to test

Return type

bool

See also

is_amine

glypy.algorithms.similarity.is_generic_monosaccharide(monosaccharide)[source]

Tests if the stem is unknown.

Parameters

monosaccharide (Monosaccharide) – The object to test

Return type

bool

glypy.algorithms.similarity.is_derivatized(monosaccharide)[source]

Tests whether any of the substituents attached to monosaccharide were added by derivatization.

Parameters

monosaccharide (Monosaccharide) – The object to test

Return type

bool

Convenience Specializations

Some common predicates have been pre-bound using functools.partial. Their names should hopefully be self-explanatory.

glypy.algorithms.similarity.has_fucose()
glypy.algorithms.similarity.has_n_acetyl()
glypy.algorithms.similarity.is_acidic()
glypy.algorithms.similarity.is_sulfated()