Compressed indexes for approximate string matching

Authors:
Ho-Leung Chan;Tak-Wah Lam;Wing-Kin Sung;Siu-Lung Tam;Swee-Seong Wong
Affiliations:
Department of Computer Science, University of Hong Kong;Department of Computer Science, University of Hong Kong;Department of Computer Science, National University of Singapore;Department of Computer Science, University of Hong Kong;Department of Computer Science, National University of Singapore
Venue:
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Year:
2006

Citing 16
Cited 2

Fast algorithms for finding nearest common ancestors

SIAM Journal on Computing
Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Succinct representations of lcp information and improvements in the compressed suffix arrays

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient algorithms for document retrieval problems

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Indexing and Dictionary Matching with One Error

WADS '99 Proceedings of the 6th International Workshop on Algorithms and Data Structures
Tables

Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Range Searching Over Tree Cross Products

ESA '00 Proceedings of the 8th Annual European Symposium on Algorithms
A Metric Index for Approximate String Matching

LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Dictionary matching and indexing with errors and don't cares

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
A linear size index for approximate pattern matching

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Improved approximate string matching using compressed suffix data structures

ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation

Indexing methods for approximate dictionary searching: Comparative analysis

Journal of Experimental Algorithmics (JEA)
Approximate String Processing

Foundations and Trends in Databases

Quantified Score

Hi-index	0.01

Visualization

Abstract

We revisit the problem of indexing a string S[1..n] to support searching all substrings in S that match a given pattern P[1..m] with at most k errors. Previous solutions either require an index of size exponential in k or need Ω(mk) time for searching. Motivated by the indexing of DNA sequences, we investigate space efficient indexes that occupy only O(n) space. For k = 1, we give an index to support matching in O(m + occ + logn loglogn) time. The previously best solution achieving this time complexity requires an index of size O(n logn). This new index can be used to improve existing indexes for k ≥2 errors. Among others, it can support matching with k=2 errors in O(m logn loglogn + occ) time.