Efficient subsequence search in databases

  • Authors:
  • Rohit Jain;Mukesh K. Mohania;Sunil Prabhakar

  • Affiliations:
  • Department of Computer Sciences, Purdue University, West Lafayette, IN;IBM India Research Lab, New Delhi, India;Department of Computer Sciences, Purdue University, West Lafayette, IN

  • Venue:
  • WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Finding tuples in a database that match a particular subsequence (with gaps) is an important problem for a range of applications. Subsequence search is equivalent to searching for regular expressions of the type.* q1.* q2.* ….* ql.*, where the subsequence is q1q2 …ql. For efficient execution of these queries, there is a need for appropriate index structures that are both efficient and can scale to large problem sizes. This paper presents two index structures for such queries based on trie and bitmap. These indices are disk-resident, hence can be easily used by large databases with limited memory availability. Our indices are applicable to dynamic databases, where tuples can be added or deleted. Both indices are implemented and validated against a naive approach. The results show that the proposed indices are efficient, having low I/O and time overhead.