Fast profile matching algorithms – A survey

Authors:
Cinzia Pizzi;Esko Ukkonen
Affiliations:
Department of Computer Science, University of Helsinki, Finland;Helsinki Institute for Information Technology, University of Helsinki and Helsinki University of Technology, Finland
Venue:
Theoretical Computer Science
Year:
2008

Citing 16
Cited 1

Text algorithms

Text algorithms
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Reducing the space requirement of suffix trees

Software—Practice & Experience
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences

Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
Pattern Discovery in Biosequences

ICGI '98 Proceedings of the 4th International Colloquium on Grammatical Inference
Accelerating Protein Classification Using Suffix Trees

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Some string matching problems from bioinformatics which still need better solutions

Journal of Discrete Algorithms - SPIRE 2002
Replacing suffix trees with enhanced suffix arrays

Journal of Discrete Algorithms - SPIRE 2002
Using sequence compression to speedup probabilistic profile matching

Bioinformatics
Linear-time construction of suffix arrays

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Space efficient linear time construction of suffix arrays

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Fast search algorithms for position specific scoring matrices

BIRD'07 Proceedings of the 1st international conference on Bioinformatics research and development
Large scale matching for position weight matrices

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Compression of individual sequences via variable-rate coding

IEEE Transactions on Information Theory

Algorithms for weighted matching

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval

Quantified Score

Hi-index	5.23

Visualization

Abstract

Position-specific scoring matrices are a popular choice for modelling signals or motifs in biological sequences, both in DNA and protein contexts. A lot of effort has been dedicated to the definition of suitable scores and thresholds for increasing the specificity of the model and the sensitivity of the search. It is quite surprising that, until very recently, little attention has been paid to the actual process of finding the matches of the matrices in a set of sequences, once the score and the threshold have been fixed. In fact, most profile matching tools still rely on a simple sliding window approach to scan the input sequences. This can be a very time expensive routine when searching for hits of a large set of scoring matrices in a sequence database. In this paper we will give a survey of proposed approaches to speed up profile matching based on statistical significance, multipattern matching, filtering, indexing data structures, matrix partitioning, Fast Fourier Transform and data compression. These approaches improve the expected searching time of profile matching, thus leading to implementation of faster tools in practice.