Efficient Algorithms for Two Extensions of LPF Table: The Power of Suffix Arrays

Authors:
Maxime Crochemore;Costas S. Iliopoulos;Marcin Kubica;Wojciech Rytter;Tomasz Waleń
Affiliations:
Dept. of Computer Science, King's College London, London, UK WC2R 2LS and Université Paris-Est, France;Dept. of Computer Science, King's College London, London, UK WC2R 2LS and Digital Ecosystems & Business Intelligence Institute, Curtin University of Technology, Perth, Australia WA 6845;Institute of Informatics, University of Warsaw, Warsaw, Poland;Institute of Informatics, University of Warsaw, Warsaw, Poland and Faculty of Math. and Informatics, Copernicus University, Torun, Poland;Institute of Informatics, University of Warsaw, Warsaw, Poland
Venue:
SOFSEM '10 Proceedings of the 36th Conference on Current Trends in Theory and Practice of Computer Science
Year:
2009

Citing 13
Cited 7

Transducers and repetitions

Theoretical Computer Science
Text compression

Text compression
Detecting leftmost maximal periodicities

Discrete Applied Mathematics - Combinatorics and complexity
The LCA Problem Revisited

LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Finding Maximal Repetitions in a Word in Linear Time

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Scaling and related techniques for geometry problems

STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
Algorithms on Strings

Algorithms on Strings
Succinct data structures for flexible text retrieval systems

Journal of Discrete Algorithms
Algorithmic Aspects of Bioinformatics (Natural Computing Series)

Algorithmic Aspects of Bioinformatics (Natural Computing Series)
Searching for Gapped Palindromes

CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
LPF Computation Revisited

Combinatorial Algorithms
Theoretical and practical improvements on the RMQ-Problem, with applications to LCA and LCE

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
A new succinct representation of RMQ-information and improvements in the enhanced suffix array

ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies

Searching for gapped palindromes

Theoretical Computer Science
Computing Longest Previous non-overlapping Factors

Information Processing Letters
Hunting redundancies in strings

DLT'11 Proceedings of the 15th international conference on Developments in language theory
Parameterized longest previous factor

Theoretical Computer Science
Near real-time suffix tree construction via the fringe marked ancestor problem

Journal of Discrete Algorithms
New simple efficient algorithms computing powers and runs in strings

Discrete Applied Mathematics
Extracting powers and periods in a word from its runs structure

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Suffix arrays provide a powerful data structure to solve several questions related to the structure of all the factors of a string. We show how they can be used to compute efficiently two new tables storing different types of previous factors (past segments) of a string. The concept of a longest previous factor is inherent to Ziv-Lempel factorization of strings in text compression, as well as in statistics of repetitions and symmetries. The longest previous reverse factor for a given position i is the longest factor starting at i, such that its reverse copy occurs before, while the longest previous non-overlapping factor is the longest factor v starting at i which has an exact copy occurring before. The previous copies of the factors are required to occur in the prefix ending at position i 驴 1. We design algorithms computing the table of longest previous reverse factors (LPrF table) and the table of longest previous non-overlapping factors (LPnF table). The latter table is useful to compute repetitions while the former is a useful tool for extracting symmetries. These tables are computed, using two previously computed read-only arrays (SUF and LCP) composing the suffix array, in linear time on any integer alphabet. The tables have not been explicitly considered before, but they have several applications and they are natural extensions of the LPF table which has been studied thoroughly before. Our results improve on the previous ones in several ways. The running time of the computation no longer depends on the size of the alphabet, which drops a log factor. Moreover the newly introduced tables store additional information on the structure of the string, helpful to improve, for example, gapped palindrome detection and text compression using reverse factors.