Fast parallel and serial approximate string matching
Journal of Algorithms
Data compression via textual substitution
Journal of the ACM (JACM)
An analysis of the Burrows—Wheeler transform
Journal of the ACM (JACM)
Approximate String Matching: A Simpler Faster Algorithm
SIAM Journal on Computing
Application of Lempel--Ziv factorization to the approximation of grammar-based compression
Theoretical Computer Science
Note: A simple storage scheme for strings achieving entropy bounds
Theoretical Computer Science
Compressed Text Indexes with Fast Locate
CPM '07 Proceedings of the 18th annual symposium on Combinatorial Pattern Matching
Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
On compressing the textual web
Proceedings of the third ACM international conference on Web search and data mining
LZ77-Like Compression with Fast Random Access
DCC '10 Proceedings of the 2010 Data Compression Conference
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Random access to grammar-compressed strings
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Stronger Lempel-Ziv Based Compressed Text Indexing
Algorithmica
Grammar-based compression in a streaming model
LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
Hi-index | 0.00 |
Motivated by the imminent growth of massive, highly redundant genomic databases we study the problem of compressing a string database while simultaneously supporting fast random access, substring extraction and pattern matching to the underlying string(s). Bille et al. (2011) recently showed how, given a straight-line program with r rules for a string s of length n, we can build an $\ensuremath{\mathcal{O}\!\left( {r} \right)}$-word data structure that allows us to extract any substring s [i..j] in $\ensuremath{\mathcal{O}\!\left( {\log n + j - i} \right)}$ time. They also showed how, given a pattern p of length m and an edit distance k≤m, their data structure supports finding all occ approximate matches to p in s in $\ensuremath{\mathcal{O}\!\left( {r (\min (m k, k^4 + m) + \log n) + \ensuremath{\mathsf{occ}}} \right)}$ time. Rytter (2003) and Charikar et al. (2005) showed that r is always at least the number z of phrases in the LZ77 parse of s, and gave algorithms for building straight-line programs with $\ensuremath{\mathcal{O}\!\left( {z \log n} \right)}$ rules. In this paper we give a simple $\ensuremath{\mathcal{O}\!\left( {z \log n} \right)}$-word data structure that takes the same time for substring extraction but only $\ensuremath{\mathcal{O}\!\left( {z (\min (m k, k^4 + m)) + \ensuremath{\mathsf{occ}}} \right)}$ time for approximate pattern matching.