Efficient subsequence search in databases

Authors:
Rohit Jain;Mukesh K. Mohania;Sunil Prabhakar
Affiliations:
Department of Computer Sciences, Purdue University, West Lafayette, IN;IBM India Research Lab, New Delhi, India;Department of Computer Sciences, Purdue University, West Lafayette, IN
Venue:
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Year:
2013

Citing 10
Cited 0

An algorithm for string matching with a sequence of don't cares

Information Processing Letters
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Fast text searching for regular expressions or automaton searching on tries

Journal of the ACM (JACM)
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Byte-aligned bitmap compression

DCC '95 Proceedings of the Conference on Data Compression
A Fast Regular Expression Indexing Engine

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
RE-tree: an efficient index structure for regular expressions

The VLDB Journal — The International Journal on Very Large Data Bases
Parameterized pattern queries

Data & Knowledge Engineering
File searching using variable length keys

IRE-AIEE-ACM '59 (Western) Papers presented at the the March 3-5, 1959, western joint computer conference
Business Intelligence from Voice of Customer

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding tuples in a database that match a particular subsequence (with gaps) is an important problem for a range of applications. Subsequence search is equivalent to searching for regular expressions of the type.* q1.* q2.* ….* ql.*, where the subsequence is q1q2 …ql. For efficient execution of these queries, there is a need for appropriate index structures that are both efficient and can scale to large problem sizes. This paper presents two index structures for such queries based on trie and bitmap. These indices are disk-resident, hence can be easily used by large databases with limited memory availability. Our indices are applicable to dynamic databases, where tuples can be added or deleted. Both indices are implemented and validated against a naive approach. The results show that the proposed indices are efficient, having low I/O and time overhead.