New Perspectives on the Prefix Array

Authors:
W. F. Smyth;Shu Wang
Affiliations:
Algorithms Research Group, Department of Computing & Software, McMaster University, Hamilton, Canada L8S 4K1 and Digital Ecosystems & Business Intelligence Institute, Curtin University, Perth, Aus ...;Algorithms Research Group, Department of Computing & Software, McMaster University, Hamilton, Canada L8S 4K1
Venue:
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Year:
2008

Citing 13
Cited 2

An O(n log n) algorithm for finding all repetitions in a string

Journal of Algorithms
Generalized string matching

SIAM Journal on Computing
Detecting leftmost maximal periodicities

Discrete Applied Mathematics - Combinatorics and complexity
Tree pattern matching and subset matching in deterministic O(n log3 n)-time

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
On improving the worst case running time of the Boyer-Moore string matching algorithm

Communications of the ACM
A fast string searching algorithm

Communications of the ACM
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Partial words and a theorem of Fine and Wilf revisited

Theoretical Computer Science
String regularities with don't cares

Nordic Journal of Computing - Special issue: Selected papers of the Prague Stringology conference (PSC'02), September 23-24, 2002
Border array on bounded alphabet

Journal of Automata, Languages and Combinatorics
Algorithms on Strings

Algorithms on Strings
Fast pattern-matching on indeterminate strings

Journal of Discrete Algorithms
The constrained longest common subsequence problem for degenerate strings

CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata

Indeterminate string inference algorithms

Journal of Discrete Algorithms
Linear time inference of strings from cover arrays using a binary alphabet

WALCOM'12 Proceedings of the 6th international conference on Algorithms and computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we consider the prefix array π =π[1..n] of a string x =x[1..n] in which π[1]=0 and, for i 1, π[i = k iff k is the largest integersuch that x[i..i+k-1]. The prefix array πis closely related to the border array β: an integerarray [1..n ] such that β[i = kiff the length of the longest border of x[1..i] isk . Border arrays or their variants are used in many stringalgorithms and prefix arrays can be used directly forpattern-matching. It is well known that for regular strings πprovides all the information that β does; we showhowever that for indeterminate strings (those containing entriesthat match a subset of the alphabet) π actually provides moreinformation, in fact still enabling all the borders of every prefixof x to be specified. Since a lot of the entries of π areexpected to be zeros, it is natural to represent π in compressedform using integer arrays POS[1..m] and LEN[1..m],where m is the number of nonzero entries in π andπ[POS[j]] = LEN [j] iff the $j^{\mbox{th}}$nonzero entry in π occurs in position POS[j] and takesthe value LEN [j]. The expected value of m isn /σ - 1, where σ is thealphabet size. The straightforward way of computing POS/LENrequires computing π first, therefore requiresO (n ) extra space. We describe twoθ (n )-time algorithms PL1 & PL2 tocompute POS/LEN for regular strings using only 8m bytes ofstorage in addition to the n bytes required for x.PL1 requires about one-third the time of the standard border arrayalgorithm MP on English-language strings; PL2 executes faster thanMP on both English-language and highly periodic strings on{a ,b }. For indeterminate strings, we describe anextension IPL of PL1 that computes POS/LEN in O (n 2) worst-case time (though generally much faster), stillusing only 8m bytes of additional storage. For bothregular and indeterminate strings, the compressed form of π canbe used for efficient pattern-matching.