Text algorithms
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Reducing the space requirement of suffix trees
Software—Practice & Experience
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
Pattern Discovery in Biosequences
ICGI '98 Proceedings of the 4th International Colloquium on Grammatical Inference
Accelerating Protein Classification Using Suffix Trees
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Some string matching problems from bioinformatics which still need better solutions
Journal of Discrete Algorithms - SPIRE 2002
Replacing suffix trees with enhanced suffix arrays
Journal of Discrete Algorithms - SPIRE 2002
Linear-time construction of suffix arrays
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Space efficient linear time construction of suffix arrays
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Fast search algorithms for position specific scoring matrices
BIRD'07 Proceedings of the 1st international conference on Bioinformatics research and development
Large scale matching for position weight matrices
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Compression of individual sequences via variable-rate coding
IEEE Transactions on Information Theory
Algorithms for weighted matching
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Hi-index | 5.23 |
Position-specific scoring matrices are a popular choice for modelling signals or motifs in biological sequences, both in DNA and protein contexts. A lot of effort has been dedicated to the definition of suitable scores and thresholds for increasing the specificity of the model and the sensitivity of the search. It is quite surprising that, until very recently, little attention has been paid to the actual process of finding the matches of the matrices in a set of sequences, once the score and the threshold have been fixed. In fact, most profile matching tools still rely on a simple sliding window approach to scan the input sequences. This can be a very time expensive routine when searching for hits of a large set of scoring matrices in a sequence database. In this paper we will give a survey of proposed approaches to speed up profile matching based on statistical significance, multipattern matching, filtering, indexing data structures, matrix partitioning, Fast Fourier Transform and data compression. These approaches improve the expected searching time of profile matching, thus leading to implementation of faster tools in practice.