Efficient suffix trees on secondary storage
Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Approximation algorithms
On the common substring alignment problem
Journal of Algorithms
Color Set Size Problem with Application to String Matching
CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Journal of the ACM (JACM)
ACM Computing Surveys (CSUR)
Succinct data structures for flexible text retrieval systems
Journal of Discrete Algorithms
Compressed Suffix Trees with Full Functionality
Theory of Computing Systems
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Space Efficient String Mining under Frequency Constraints
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Bioinformatics
Range Quantile Queries: Another Virtue of Wavelet Trees
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Faster entropy-bounded compressed suffix trees
Theoretical Computer Science
High Throughput Short Read Alignment via Bi-directional BWT
BIBM '09 Proceedings of the 2009 IEEE International Conference on Bioinformatics and Biomedicine
Bidirectional search in a string with wavelet trees
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
ACM Transactions on Algorithms (TALG)
Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays
SIAM Journal on Computing
Unified view of backward backtracking in short read mapping
Algorithms and Applications
Indexed multi-pattern matching
LATIN'12 Proceedings of the 10th Latin American international conference on Theoretical Informatics
Hi-index | 0.00 |
We study multi-pattern matching in a scenario where the pattern set is to be matched to several texts and hence indexing the pattern set is affordable. This kind of scenarios arise, for example, in metagenomics, where pattern set represents DNA of several species and the goal is to find out which species are represented in the sample and in which quantity. We develop a generic search method that exploits bidirectional indexes both for the pattern set and texts, and analyze the best and worst case running time of the method on worst case text. We show that finding the instance of the search method with minimum best case running time on worst case text is NP-hard. The positive result is that an instance with logarithm factor approximation to minimum best case running time can be found in polynomial time using a bidirectional index called affix tree. We further show that affix trees can be simulated, in reduced space, using bidirectional variant of compressed suffix trees. Lastly, we provide a practical implementation of this approach. We show that one can obtain 3-fold speed up against the basic scenario of searching each pattern independently with data sets typical in high-throughput DNA sequencing.