Indexed Hierarchical Approximate String Matching

Authors:
Luís M. Russo;Gonzalo Navarro;Arlindo L. Oliveira
Affiliations:
INESC-ID, Lisboa, Portugal 1000 and CITI, Departamento de Informática, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Portugal;Dept. of Computer Science, University of Chile,;INESC-ID, Lisboa, Portugal 1000 and Instituto Superior Técnico, Universidade Técnica de Lisboa, Portugal
Venue:
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Year:
2008

Citing 14
Cited 0

Fast text searching: allowing errors

Communications of the ACM
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Incremental String Comparison

SIAM Journal on Computing
A fast bit-vector algorithm for approximate string matching based on dynamic programming

Journal of the ACM (JACM)
Reducing the space requirement of suffix trees

Software—Practice & Experience
Very fast and simple approximate string matching

Information Processing Letters
Indexing text using the Ziv-Lempel trie

Journal of Discrete Algorithms - SPIRE 2002
New text indexing functionalities of the compressed suffix arrays

Journal of Algorithms
Indexing compressed text

Journal of the ACM (JACM)
Compressed full-text indexes

ACM Computing Surveys (CSUR)
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Approximate string matching with Lempel-Ziv compressed indexes

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Fully-compressed suffix trees

LATIN'08 Proceedings of the 8th Latin American conference on Theoretical informatics
A compressed self-index using a ziv-lempel dictionary

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new search procedure for approximate string matching over suffix trees. We show that hierarchical verification, which is a well-established technique for on-line searching, can also be used with an indexed approach. For this, we need that the index supports bidirectionality, meaning that the search for a pattern can be updated by adding a letter at the right or at the left. This turns out to be easily supported by most compressed text self-indexes, which represent the index and the text essentially in the same space of the compressed text alone. To complete the symbiotic exchange, our hierarchical verification largely reduces the need to access the text, which is expensive in compressed text self-indexes. The resulting algorithm can, in particular, run over an existing fully compressed suffix tree, which makes it very appealing for applications in computational biology. We compare our algorithm with related approaches, showing that our method offers an interesting space/time tradeoff, and in particular does not need of any parameterization, which is necessary in the most successful competing approaches.