|  | OpenMS
    2.6.0
    | 
Extended Aho-Corasick algorithm capable of matching ambiguous amino acids in the pattern (i.e. proteins). More...
#include <OpenMS/ANALYSIS/ID/AhoCorasickAmbiguous.h>
| Public Types | |
| typedef ::seqan::StringSet<::seqan::AAString > | PeptideDB | 
| typedef ::seqan::Pattern< PeptideDB, ::seqan::FuzzyAC > | FuzzyACPattern | 
| Public Member Functions | |
| AhoCorasickAmbiguous () | |
| Default Ctor; call setProtein() before using findNext().  More... | |
| AhoCorasickAmbiguous (const String &protein_sequence) | |
| Prepare to start searching for hits in a new protein sequence.  More... | |
| void | setProtein (const String &protein_sequence) | 
| Reset to new protein sequence. All previous data is forgotten.  More... | |
| bool | findNext (const FuzzyACPattern &pattern) | 
| Enumerate hits.  More... | |
| Size | getHitDBIndex () | 
| Get index of hit into peptide database of the pattern.  More... | |
| Int | getHitProteinPosition () | 
| Offset into protein sequence where hit was found.  More... | |
| Static Public Member Functions | |
| static void | initPattern (const PeptideDB &pep_db, const int aaa_max, const int mm_max, FuzzyACPattern &pattern) | 
| Construct a trie from a set of peptide sequences (which are to be found in a protein).  More... | |
| Private Types | |
| typedef FuzzyACPattern::KeyWordLengthType | KeyWordLengthType | 
| Private Attributes | |
| ::seqan::Finder< seqan::AAString > | finder_ | 
| locate the next peptide hit in protein  More... | |
| ::seqan::AAString | protein_ | 
| the protein sequence - we need to store it since the finder only keeps a pointer to protein when constructed  More... | |
| ::seqan::PatternAuxData< PeptideDB > | dh_ | 
| auxiliary data to hold a state after searching  More... | |
Extended Aho-Corasick algorithm capable of matching ambiguous amino acids in the pattern (i.e. proteins).
... Features: + blazingly fast + low memory usage + number of allowed ambAA's can be capped by user (default 3).
This implementation is based on the original AC in SeqAn.
| typedef ::seqan::Pattern<PeptideDB, ::seqan::FuzzyAC> FuzzyACPattern | 
| 
 | private | 
| typedef ::seqan::StringSet<::seqan::AAString> PeptideDB | 
| 
 | inline | 
Default Ctor; call setProtein() before using findNext().
| 
 | inline | 
Prepare to start searching for hits in a new protein sequence.
This only sets the sequence. No computation is performed. Use findNext() to enumerate the hits.
| protein_sequence | Sequence (ambiguous characters allowed) | 
References AhoCorasickAmbiguous::setProtein().
| 
 | inline | 
Enumerate hits.
| pattern | The pattern (i.e. trie) created with initPattern(). | 
References AhoCorasickAmbiguous::dh_, seqan::find(), and AhoCorasickAmbiguous::finder_.
Referenced by PeptideIndexing::addHits_().
| 
 | inline | 
Get index of hit into peptide database of the pattern.
Only valid if findNext() returned true before.
References AhoCorasickAmbiguous::dh_, and seqan::position().
Referenced by PeptideIndexing::addHits_().
| 
 | inline | 
Offset into protein sequence where hit was found.
Only valid if findNext() returned true before.
References AhoCorasickAmbiguous::finder_, and seqan::position().
Referenced by PeptideIndexing::addHits_().
| 
 | inlinestatic | 
Construct a trie from a set of peptide sequences (which are to be found in a protein).
Peptides must not contain ambiguous characters (exception thrown otherwise) or unknown characters (such as J or U). Ambiguous characters are only allowed in protein sequences.
Usage: Build the pattern only once and use it multiple times when running findNext().
| pep_db | Set of peptides | 
| aaa_max | Maximum allowed ambiguous characters in the matching protein sequence | 
| mm_max | Maximum allowed mismatches in the matching protein sequence | 
| pattern | The pattern to be created | 
| Exception::InvalidValue | if a peptide contains an unknown (U,J,...) or ambiguous character | 
Referenced by PeptideIndexing::run().
| 
 | inline | 
Reset to new protein sequence. All previous data is forgotten.
References AhoCorasickAmbiguous::dh_, AhoCorasickAmbiguous::finder_, AhoCorasickAmbiguous::protein_, and PatternAuxData< TNeedle >::reset().
Referenced by PeptideIndexing::addHits_(), and AhoCorasickAmbiguous::AhoCorasickAmbiguous().
| 
 | private | 
auxiliary data to hold a state after searching
Referenced by AhoCorasickAmbiguous::findNext(), AhoCorasickAmbiguous::getHitDBIndex(), and AhoCorasickAmbiguous::setProtein().
| 
 | private | 
locate the next peptide hit in protein
Referenced by AhoCorasickAmbiguous::findNext(), AhoCorasickAmbiguous::getHitProteinPosition(), and AhoCorasickAmbiguous::setProtein().
| 
 | private | 
the protein sequence - we need to store it since the finder only keeps a pointer to protein when constructed
Referenced by AhoCorasickAmbiguous::setProtein().
 1.8.16
 1.8.16