Heuristic Similarity¶
A collection of routines for doing fuzzy matching of monosaccharides, as well as a set of predicates for classifying common properties of monosaccharides.
Core Heuristic¶
- glypy.algorithms.similarity.monosaccharide_similarity(node, target, include_substituents=True, include_modifications=True, include_children=False, exact=True, ignore_reduction=False, ignore_ring=False, treat_null_as_wild=True, match_attachement_positions=False, short_circuit_after=None, visited=None)¶
A heuristic comparison for measuring similarity between monosaccharides.
- Compares:
ring_start and ring_end
superclass
configuration
stem
anomer
If
include_modifications
, each modificationIf
include_substituents
, each substituentIf
include_children
, each childMonosaccharide
The result is two numbers, the observed similarity between
node
andtarget
, and the similarity betweentarget
and itself.expected - observed
is there number of differences observed between the two monosaccharides, which can be useful for expressing how far apart two monosaccharides are in feature space. For more distant similarity testing, especially when considering children, the ratioobserved / expected
might be used instead.Similarity is not symmetric, e.g.
a -> b != b -> a
. A commutative version of similarity can be used by calculating both directions, and taking the result with the smallest error.
- Parameters
node (
Monosaccharide
) – The reference monosaccharidetarget (
Monosaccharide
) – The monosaccharide to compare againstinclude_substituents (bool) – Include substituents in comparison (Defaults
True
)include_modifications (bool) – Include modifications in comparison (Defaults
True
)include_children (bool) – Include children in comparison (Defaults
False
)exact (bool) – Penalize for having unmatched attachments (Defaults
True
)ignore_reduction (bool) – Whether or not to include differences in reduction state as a mismatch
ignore_ring (bool) – Whether or not to include differences in ring coordinates as a mismatch
treat_null_as_wild (bool) – Whether or not to treat traits with a value of
None
orUnknownPosition
as always matching when the null value is on the target residue (the residue that traits are being matched to).short_circuit_after (None or Number) – Controls whether to quit comparing nodes if the difference becomes too large, useful for speeding up pessimistic comparisons
visited (set) – Tracks which node pairs have already been compared to break cycles. This carries state across multiple calls to
compare()
and must be reset by callingreset()
before reusing an instance on new structures.- Returns
:class:`int` (observed) – The number of observed features that matched
:class:`int` (expected) – The number of features that could have been matched
Commutative Options¶
- glypy.algorithms.similarity.commutative_similarity(node, target, tolerance=0, *args, **kwargs)[source]¶
Apply
monosaccharide_similarity()
tonode
andtarget
for bothnode --> target
andtarget --> node
, returning whether either comparison passes the tolerance threshold.
- Parameters
node (
Monosaccharide
) – The reference monosaccharidetarget (
Monosaccharide
) – The monosaccharide to compare againsttolerance (
int
, optional) – The minimum number of errors to tolerate*args – Forwarded to
monosaccharide_similarity()
**kwargs – Forwarded to
monosaccharide_similarity()
- Return type
- glypy.algorithms.similarity.commutative_similarity_score(node, target, *args, **kwargs)[source]¶
Apply
monosaccharide_similarity()
tonode
andtarget
for bothnode --> target
andtarget --> node
, returning the maximally normalized ratio ofobserved / expected
.
- Parameters
node (
Monosaccharide
) – The reference monosaccharidetarget (
Monosaccharide
) – The monosaccharide to compare against*args – Forwarded to
monosaccharide_similarity()
**kwargs – Forwarded to
monosaccharide_similarity()
- Returns
The maximal similarity score ratio
- Return type
- glypy.algorithms.similarity.commutative_similarity_score_with_tolerance(node, target, tolerance, *args, **kwargs)[source]¶
Apply
monosaccharide_similarity()
tonode
andtarget
for bothnode --> target
andtarget --> node
, returning the maximally normalized ratio score, and whether there was a pair error less thantolerance
.This can be viewed as a combination of
commutative_similarity()
andcommutative_similarity_score()
while making fewer calls tomonosaccharide_similarity()
.
- Parameters
node (
Monosaccharide
) – The reference monosaccharidetarget (
Monosaccharide
) – The monosaccharide to compare againsttolerance (
int
) – The minimum number of errors to tolerate*args – Forwarded to
monosaccharide_similarity()
**kwargs – Forwarded to
monosaccharide_similarity()
- Returns
Predicates¶
- glypy.algorithms.similarity.has_substituent(monosaccharide, substituent)[source]¶
Checks whether
monosaccharide
has any substituent groups matchingsubstituent
.
- Parameters
monosaccharide (
Monosaccharide
) – The monosaccharide to checksubstituent (
Substituent
orstr
) – The substituent to check for- Return type
- glypy.algorithms.similarity.has_modification(monosaccharide, modification)[source]¶
Checks whether
monosaccharide
has any modification sites matchingmodification
.
- Parameters
monosaccharide (
Monosaccharide
) – The monosaccharide to checkmodification (
Modification
orstr
) – The modification to check for- Return type
- glypy.algorithms.similarity.has_monosaccharide(glycan, monosaccharide, tolerance=0, *args, **kwargs)[source]¶
Checks whether
glycan
has any monosaccharide nodes matchingmonosaccharide
withintolerance
usingcommutative_similarity()
- Parameters
glycan (
SaccharideCollection
) – The glycan structure or composition to searchmonosaccharide (
Monosaccharide
) – The monosaccharide to search fortolerance (int, optional) – The error tolerance to use
*args – Forwarded to
monosaccharide_similarity()
**kwargs – Forwarded to
monosaccharide_similarity()
- Return type
- glypy.algorithms.similarity.is_reduced(obj)[source]¶
A simple predicate to test whether an object has a reduced structure.
If
obj
does not have areducing_end
attribute, this will returnFalse
- glypy.algorithms.similarity.is_amine(substituent)[source]¶
A simple predicate to test whether a substituent has an amine group adjacent attached to the carbon backbone by naming convention.
This predicate checks to see if the name of the substituent is “amino” or if it starts with the phrase “n_” only.
- Parameters
substituent (Substituent or str) – The object to test
- Return type
- glypy.algorithms.similarity.is_aminated(monosaccharide)[source]¶
Tests to see if any substituents of
monosaccharide
are amines.Each substituent is tested using
is_amine()
, with all the caveats that entails.
- Parameters
monosaccharide (Monosaccharide) – The monosaccharide to test
- Return type
See also
- glypy.algorithms.similarity.is_generic_monosaccharide(monosaccharide)[source]¶
Tests if the
stem
is unknown.
- Parameters
monosaccharide (Monosaccharide) – The object to test
- Return type
- glypy.algorithms.similarity.is_derivatized(monosaccharide)[source]¶
Tests whether any of the substituents attached to
monosaccharide
were added by derivatization.
- Parameters
monosaccharide (Monosaccharide) – The object to test
- Return type