Substructure Search Methods¶
A set of algorithms for finding similar structures and substructures.
Inclusion Comparison Algorithms¶
Inclusion-based search asks whether one graph is completely contained in another sub-graph,
using either Exact Matching or Topological Matching according to the chosen
approach. These comparisons can be made fuzzy, using commutative_similarity()
to evaluate matches between individual nodes in a graph.
Direct Comparators¶
These direct algorithms compare two structures starting from the provided root nodes, using
their particular matching strategy. They are useful for immediate comparisons of similarity,
returning the maximal commutative_similarity()
of their traversal, but they
do not consider non-root-to-root comparisons.
- glypy.algorithms.subtree_search.topological_inclusion(target, reference, substituents=True, tolerance=0, visited=None)¶
A generalization of
topological_equality()
which allows fortarget
to be matched toreference
, but forreference
to include more. Consequently, this method is not commutative.
- Parameters
target (
Monosaccharide
) – The monosaccharide to comparereference (
Monosaccharide
) – The monosaccharide to compare againstsubstituents (
bool
, optional) – Whether or not to compare with the substituents of each node. Defaults toTrue
.tolerance (
float
, optional) – The maximum difference to permit between nodes for inclusion. Defaults to 0- Returns
The inclusion score. Greater than 0 indicates topological inclusion, though larger corresponds to better alignment.
- Return type
- glypy.algorithms.subtree_search.exact_ordering_inclusion(target, reference, substituents=True, tolerance=0, visited=None)[source]¶
A generalization of
exact_ordering_equality()
which allows fortarget
to be matched toreference
, but forreference
to include more. Consequently, this method is not commutative.
- Parameters
target (
Monosaccharide
) – The monosaccharide to comparereference (
Monosaccharide
) – The monosaccharide to compare againstsubstituents (
bool
, optional) – Whether or not to compare with the substituents of each node. Defaults toTrue
.tolerance (
float
, optional) – The maximum difference to permit between nodes for inclusion. Defaults to 0- Returns
The similarity score between the two structures. A score of 0 means no inclusion, and a non-zero score means inclusion, with larger scores indicating more node pairs matching.
- Return type
See also
commutative_similarity_score_with_tolerance
,topological_inclusion
Substructure Inclusion-based Algorithms¶
Building off the Direct Comparators, these algorithms answer higher level questions about whether one structure is included in another.
- glypy.algorithms.subtree_search.subtree_of(subtree, tree, exact=False, include_substituents=True, tolerance=0)[source]¶
Test to see if
subtree
is included intree
anywhere. Returns the node id number of the first occurence ofsubtree
included intree
orNone
if it is not found.
- Parameters
subtree (
Glycan
) – The structure to search for. The search attempts to match the complete structure of subtree.tree (
Glycan
) – The sturcture to search in. The search iterates over each residue intree
and calls a comparator function, comparing thesubtree
to the substructure rooted at that residue.exact (
bool
) – IfTrue
, useexact_ordering_inclusion()
to compare nodes. Otherwise usetopological_inclusion()
. Defaults toFalse
.- Return type
int
orNone
if no match
- glypy.algorithms.subtree_search.find_matching_subtree_roots(subtree, tree, exact=False, include_substituents=True, tolerance=0)[source]¶
Find the list of nodes where occurences of
subtree
included intree
are rooted.
- Parameters
subtree (
Glycan
) – The structure to search for. The search attempts to match the complete structure of subtree.tree (
Glycan
) – The sturcture to search in. The search iterates over each residue intree
and calls a comparator function, comparing thesubtree
to the substructure rooted at that residue.exact (
bool
) – IfTrue
, useexact_ordering_inclusion()
to compare nodes. Otherwise usetopological_inclusion()
. Defaults toFalse
.- Return type
Partial Subgraph Walks¶
- glypy.algorithms.subtree_search.walk_with(query, reference, visited=None, comparator=<function commutative_similarity>, include_substituents=True)[source]¶
Walk the
query
alongreference
, yielding successive matched nodes along a subtree ofreference
using a Exact Matching traversal.This function provides slightly more detail than
subtree_of()
as it exposes every step along the subgraph traversal, rather than just the root.
- Parameters
query (
Glycan
) – The query structure to search withreference (
Glycan
) – The reference structure to search invisited (
set
, optional) – The set of node id pairs to ignorecomparator (
Callable
, optional) – TheCallable
object which can compare twoMonosaccharide
- Yields
(
Monosaccharide
,Monosaccharide
) – Pairs of matched nodes along a path
Common Substructure-based Algorithms¶
- glypy.algorithms.subtree_search.maximum_common_subgraph(seq_a, seq_b, exact=True)¶
Find the maximum common subgraph between
seq_a
andseq_b
.
- Parameters
- Return type
References
[1] K. F. Aoki, A. Yamaguchi, Y. Okuno, T. Akutsu, N. Ueda, M. Kanehisa, and H. Mamitsuka, “Efficient tree-matching methods for accurate carbohydrate database queries.” Genome Inform. Jan. 2003.
- glypy.algorithms.subtree_search.n_saccharide_similarity(self, other, n=2, exact=False)[source]¶
Calculate n-saccharide similarity between two structures
- Parameters
- Returns
score – How similar these structures are at the n-saccharide level. Ranges between 0 and 1.0 where 1.0 is exactly the same, while 0.0 means no shared n-saccharides.
- Return type
References
[1] K. F. Aoki, A. Yamaguchi, Y. Okuno, T. Akutsu, N. Ueda, M. Kanehisa, and H. Mamitsuka, “Efficient tree-matching methods for accurate carbohydrate database queries.” Genome Inform. Jan. 2003.
Implementation Details of maximum_common_subgraph()
¶
- class glypy.algorithms.subtree_search.MaximumCommonSubgraphSolver(seq_a, seq_b, exact=True)[source]¶
Find the maximum common subgraph between
seq_a
andseq_b
.
- Variables
References
[1] K. F. Aoki, A. Yamaguchi, Y. Okuno, T. Akutsu, N. Ueda, M. Kanehisa, and H. Mamitsuka, “Efficient tree-matching methods for accurate carbohydrate database queries.” Genome Inform. Jan. 2003.
Treelets¶
- glypy.algorithms.subtree_search.treelets(glycan, k, distinct=True)[source]¶
Iterator over all distinct \(k\)-treelets of
tree
, all unique sub-trees oftree
withk
monosaccharides.
- glypy.algorithms.subtree_search.treelet_enrichment(cond1, cond2, k, distinct=True)¶
Perform a test for each treelet to determine if it is enriched in cond1 compared to cond2.
- Parameters
cond1 (list of
Glycan
) – Glycans from the first condition, the condition to be tested for enrichmentcond2 (list of
Glycan
) – Glycans from the second condition, the background conditionk (int) – The size of the treelet to use
distinct (bool) – Whether or not to count redundant treelets. Defaults to
True
- Returns
A mapping from treelet to p value from Fisher’s Exact Test for that treelet being found with its observed frequency in cond1 if it is as common in cond1 as in cond2.
- Return type
References
Pevzner, P., & Shamir, R. (2011). Bioinformatics for Biologists. New York, NY, USA: Cambridge University Press.
Implementation Details¶
Treelet methods are built atop a more complex object system
- class glypy.algorithms.subtree_search.Treelet(subtree, frontier_ids)[source]¶
Represents a subgraph of a larger
Glycan
, with a frontier of node ids which are children of the current subgraph.