An Online Algorithm for Finding the Longest Previous Factors

Authors:
Daisuke Okanohara;Kunihiko Sadakane
Affiliations:
Department of Computer Science, University of Tokyo, Tokyo, Japan 113-0013;Department of Computer Science and Communication Engineering, Kyushu University, Fukuoka, Japan 819-0395
Venue:
ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Year:
2008

Citing 18
Cited 6

New indices for text: PAT Trees and PAT arrays

Information retrieval
Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
An improved data structure for cumulative probability tables

Software—Practice & Experience
Succinct representations of lcp information and improvements in the compressed suffix arrays

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Extended application of suffix trees to data compression

DCC '96 Proceedings of the Conference on Data Compression
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Replacing suffix trees with enhanced suffix arrays

Journal of Discrete Algorithms - SPIRE 2002
Compressed full-text indexes

ACM Computing Surveys (CSUR)
Compressed indexes for dynamic text collections

ACM Transactions on Algorithms (TALG)
Compressed Suffix Trees with Full Functionality

Theory of Computing Systems
Computing Longest Previous Factor in linear time and applications

Information Processing Letters
A Simple Algorithm for Computing the Lempel Ziv Factorization

DCC '08 Proceedings of the Data Compression Conference
Fast and Practical Algorithms for Computing All the Runs in a String

CPM '07 Proceedings of the 18th annual symposium on Combinatorial Pattern Matching
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Theoretical and practical improvements on the RMQ-Problem, with applications to LCA and LCE

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Dynamic rank-select structures with applications to run-length encoded texts

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
A new succinct representation of RMQ-information and improvements in the enhanced suffix array

ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies

Lempel-Ziv factorization revisited

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays

SIAM Journal on Computing
Computing lempel-ziv factorization online

MFCS'12 Proceedings of the 37th international conference on Mathematical Foundations of Computer Science
A comparison of index-based lempel-Ziv LZ77 factorization algorithms

ACM Computing Surveys (CSUR)
Computing regularities in strings: A survey

European Journal of Combinatorics
On compressing and indexing repetitive sequences

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel algorithm for finding the longest factors in a text, for which the working space is proportional to the history text size. Moreover, our algorithm is online and exact; in that, unlike the previous batch algorithms [4, 5, 6, 7, 14], which needs to read the entire input beforehand, our algorithm reports the longest match just after reading each character. This algorithm can be directly used for data compression, pattern analysis, and data mining. Our algorithm also supports the window buffer, in that we can bound the working space by discarding the history from the oldest character. Using the dynamic rank/select dictionary [17], our algorithm requires nlog茂戮驴+ O(nlog茂戮驴) + O(n) bits of working space, and O(log3n) time per character, O(nlog3n) total time, nis the length of the history, and 茂戮驴is the alphabet size. We implemented our algorithm and compared it with the recent algorithms [4, 5, 14] in terms of speed and the working space. We found that our algorithm can work with a smaller working space, less than 1/2 of those for the previous methods in real-world data, and with a reasonable decline in speed.