Generalized Substring Compression

Authors:
Orgad Keller;Tsvi Kopelowitz;Shir Landau;Moshe Lewenstein
Affiliations:
Department of Computer Science, Bar-Ilan University, Ramat-Gan, Israel 52900;Department of Computer Science, Bar-Ilan University, Ramat-Gan, Israel 52900;Department of Computer Science, Bar-Ilan University, Ramat-Gan, Israel 52900;Department of Computer Science, Bar-Ilan University, Ramat-Gan, Israel 52900
Venue:
CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Year:
2009

Citing 12
Cited 2

Fast algorithms for finding nearest common ancestors

SIAM Journal on Computing
Dynamic text indexing under string updates

Journal of Algorithms
Multi-method dispatching: a geometric approach with applications to string matching problems

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Text indexing and dictionary matching with one error

Journal of Algorithms
Optimal suffix tree construction with large alphabets

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
New data structures for orthogonal range searching

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
The level ancestor problem simplified

Theoretical Computer Science - Latin American theorotical informatics
Substring compression problems

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Position-Restricted substring searching

LATIN'06 Proceedings of the 7th Latin American conference on Theoretical Informatics
Range non-overlapping indexing and successive list indexing

WADS'07 Proceedings of the 10th international conference on Algorithms and Data Structures

Range LCP

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Extracting powers and periods in a word from its runs structure

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

In substring compression one is given a text to preprocess so that, upon request, a compressed substring is returned. Generalized substring compression is the same with the following twist. The queries contain an additional context substring (or a collection of context substrings) and the answers are the substring in compressed format, where the context substring is used to make the compression more efficient. We focus our attention on generalized substring compression and present the first non-trivial correct algorithm for this problem. In our algorithm we inherently propose a method for finding the bounded longest common prefix of substrings, which may be of independent interest. In addition, we propose an efficient algorithm for substring compression which makes use of range searching for minimum queries. We present several tradeoffs for both problems. For compressing the substring S [i . . j ] (possibly with the substring S [*** . . β ] as a context), best query times we achieve are O (C ) and $O\big(C\log\big(\frac{j-i}{C}\big)\big)$ for substring compression query and generalized substring compression query, respectively, where C is the number of phrases encoded.