Multi-pattern matching with bidirectional indexes

Authors:
Simon Gog;Kalle Karhu;Juha Kärkkäinen;Veli Mäkinen;Niko Välimäki
Affiliations:
Department of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia;Department of Computer Science and Engineering, Aalto University, Espoo, Finland;Department of Computer Science, University of Helsinki, Helsinki, Finland;Department of Computer Science, University of Helsinki, Helsinki, Finland;Department of Computer Science, University of Helsinki, Helsinki, Finland
Venue:
Journal of Discrete Algorithms
Year:
2014

Citing 20
Cited 0

Efficient suffix trees on secondary storage

Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Approximation algorithms

Approximation algorithms
On the common substring alignment problem

Journal of Algorithms
Color Set Size Problem with Application to String Matching

CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Indexing compressed text

Journal of the ACM (JACM)
Compressed full-text indexes

ACM Computing Surveys (CSUR)
Succinct data structures for flexible text retrieval systems

Journal of Discrete Algorithms
Compressed Suffix Trees with Full Functionality

Theory of Computing Systems
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Space Efficient String Mining under Frequency Constraints

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Fast and accurate short read alignment with Burrows–Wheeler transform

Bioinformatics
SOAP2

Bioinformatics
Range Quantile Queries: Another Virtue of Wavelet Trees

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Faster entropy-bounded compressed suffix trees

Theoretical Computer Science
High Throughput Short Read Alignment via Bi-directional BWT

BIBM '09 Proceedings of the 2009 IEEE International Conference on Bioinformatics and Biomedicine
Bidirectional search in a string with wavelet trees

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Fully compressed suffix trees

ACM Transactions on Algorithms (TALG)
Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays

SIAM Journal on Computing
Unified view of backward backtracking in short read mapping

Algorithms and Applications
Indexed multi-pattern matching

LATIN'12 Proceedings of the 10th Latin American international conference on Theoretical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study multi-pattern matching in a scenario where the pattern set is to be matched to several texts and hence indexing the pattern set is affordable. This kind of scenarios arise, for example, in metagenomics, where pattern set represents DNA of several species and the goal is to find out which species are represented in the sample and in which quantity. We develop a generic search method that exploits bidirectional indexes both for the pattern set and texts, and analyze the best and worst case running time of the method on worst case text. We show that finding the instance of the search method with minimum best case running time on worst case text is NP-hard. The positive result is that an instance with logarithm factor approximation to minimum best case running time can be found in polynomial time using a bidirectional index called affix tree. We further show that affix trees can be simulated, in reduced space, using bidirectional variant of compressed suffix trees. Lastly, we provide a practical implementation of this approach. We show that one can obtain 3-fold speed up against the basic scenario of searching each pattern independently with data sets typical in high-throughput DNA sequencing.