Computing matching statistics and maximal exact matches on compressed full-text indexes

Authors:
Enno Ohlebusch;Simon Gog;Adrian Kügell
Affiliations:
Institute of Theoretical Computer Science, University of Ulm, Ulm;Institute of Theoretical Computer Science, University of Ulm, Ulm;Institute of Theoretical Computer Science, University of Ulm, Ulm
Venue:
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Year:
2010

Citing 16
Cited 8

Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
High-order entropy-compressed text indexes

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Fast and Sensitive Probe Selection for DNA Chips Using Jumps in Matching Statistics

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Replacing suffix trees with enhanced suffix arrays

Journal of Discrete Algorithms - SPIRE 2002
Fast and space efficient string kernels using suffix arrays

ICML '06 Proceedings of the 23rd international conference on Machine learning
Compressed full-text indexes

ACM Computing Surveys (CSUR)
A taxonomy of suffix array construction algorithms

ACM Computing Surveys (CSUR)
Compressed Suffix Trees with Full Functionality

Theory of Computing Systems
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
A practical algorithm for finding maximal exact matches in large sequence datasets using sparse suffix arrays

Bioinformatics
A Compressed Enhanced Suffix Array Supporting Fast String Matching

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Faster entropy-bounded compressed suffix trees

Theoretical Computer Science
Parallel and distributed compressed indexes

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching

Succincter text indexing with wildcards

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Computing the longest common prefix array based on the burrows-wheeler transform

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Efficient Maximal Repeat Finding Using the Burrows-Wheeler Transform and Wavelet Tree

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Bidirectional search in a string with wavelet trees and bidirectional matching statistics

Information and Computation
Lightweight LCP construction for next-generation sequencing datasets

WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
Computing the longest common prefix array based on the Burrows-Wheeler transform

Journal of Discrete Algorithms
Indexing hypertext

Journal of Discrete Algorithms
Compressed indexes for text with wildcards

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Exact string matching is a problem that computer programmers face on a regular basis, and full-text indexes like the suffix tree or the suffix array provide fast string search over large texts. In the last decade, research on compressed indexes has flourished because the main problem in large-scale applications is the space consumption of the index. Nowadays, the most successful compressed indexes are able to obtain almost optimal space and search time simultaneously. It is known that a myriad of sequence analysis and comparison problems can be solved efficiently with established data structures like the suffix tree or the suffix array, but algorithms on compressed indexes that solve these problem are still lacking at present. Here, we show that matching statistics and maximal exact matches between two strings S1 and S2 can be computed efficiently by matching S2 backwards against a compressed index of S1.