Multi-pattern matching with bidirectional indexes

  • Authors:
  • Simon Gog;Kalle Karhu;Juha Kärkkäinen;Veli Mäkinen;Niko Välimäki

  • Affiliations:
  • Department of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia;Department of Computer Science and Engineering, Aalto University, Espoo, Finland;Department of Computer Science, University of Helsinki, Helsinki, Finland;Department of Computer Science, University of Helsinki, Helsinki, Finland;Department of Computer Science, University of Helsinki, Helsinki, Finland

  • Venue:
  • Journal of Discrete Algorithms
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study multi-pattern matching in a scenario where the pattern set is to be matched to several texts and hence indexing the pattern set is affordable. This kind of scenarios arise, for example, in metagenomics, where pattern set represents DNA of several species and the goal is to find out which species are represented in the sample and in which quantity. We develop a generic search method that exploits bidirectional indexes both for the pattern set and texts, and analyze the best and worst case running time of the method on worst case text. We show that finding the instance of the search method with minimum best case running time on worst case text is NP-hard. The positive result is that an instance with logarithm factor approximation to minimum best case running time can be found in polynomial time using a bidirectional index called affix tree. We further show that affix trees can be simulated, in reduced space, using bidirectional variant of compressed suffix trees. Lastly, we provide a practical implementation of this approach. We show that one can obtain 3-fold speed up against the basic scenario of searching each pattern independently with data sets typical in high-throughput DNA sequencing.