Fast search algorithms for position specific scoring matrices

  • Authors:
  • Cinzia Pizzi;Pasi Rastas;Esko Ukkonen

  • Affiliations:
  • Department of Computer Science and Helsinki Institute for Information Technology HIIT, University of Helsinki, Finland;Department of Computer Science and Helsinki Institute for Information Technology HIIT, University of Helsinki, Finland;Department of Computer Science and Helsinki Institute for Information Technology HIIT, University of Helsinki, Finland

  • Venue:
  • BIRD'07 Proceedings of the 1st international conference on Bioinformatics research and development
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Fast search algorithms for finding good instances of patterns given as position specific scoring matrices are developed, and some empirical results on their performance on DNA sequences are reported. The algorithms basically generalize the Aho-Corasick, filtration, and superalphabet techniques of string matching to the scoring matrix search. As compared to the naive search, our algorithms can be faster by a factor which is proportional to the length of the pattern. In our experimental comparison of different algorithms the new algorithms were clearly faster than the naive method and also faster than the well-known lookahead scoring algorithm. The Aho-Corasick technique is the fastest for short patterns and high significance thresholds of the search. For longer patterns the filtration method is better while the superalphabet technique is the best for very long patterns and low significance levels. We also observed that the actual speed of all these algorithms is very sensitive to implementation details.