Suffix arrays on words

Authors:
Paolo Ferragina;Johannes Fischer
Affiliations:
Dipartimento di Informatica, University of Pisa;Institut für Informatik, Ludwig-Maximilians-Universität München
Venue:
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Year:
2007

Citing 21
Cited 7

Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Guidelines for presentation and comparison of indexing techniques

ACM SIGMOD Record
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Word-based block-sorting text compression

ACSC '01 Proceedings of the 24th Australasian conference on Computer science
Nearest common ancestors: a survey and a new distributed algorithm

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Sparse Suffix Trees

COCOON '96 Proceedings of the Second Annual International Conference on Computing and Combinatorics
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Breaking a Time-and-Space Barrier in Constructing Full-Text Indices

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
MARSYAS: a framework for audio analysis

Organised Sound
Replacing suffix trees with enhanced suffix arrays

Journal of Discrete Algorithms - SPIRE 2002
Engineering a Lightweight Suffix Array Construction Algorithm

Algorithmica
Handbook of Computational Molecular Biology (Chapman & All/Crc Computer and Information Science Series)

Handbook of Computational Molecular Biology (Chapman & All/Crc Computer and Information Science Series)
Compressed full-text indexes

ACM Computing Surveys (CSUR)
Linear work suffix array construction

Journal of the ACM (JACM)
Note: A simple storage scheme for strings achieving entropy bounds

Theoretical Computer Science
An efficient, versatile approach to suffix sorting

Journal of Experimental Algorithmics (JEA)
On-line construction of compact directed acyclic word graphs

Discrete Applied Mathematics
On-Line linear-time construction of word suffix trees

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Sparse directed acyclic word graphs

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
A new succinct representation of RMQ-information and improvements in the enhanced suffix array

ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies

Speeding Up Pattern Matching by Text Sampling

SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Index structures for efficiently searching natural language text

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Engineering basic algorithms of an in-memory text search engine

ACM Transactions on Information Systems (TOIS)
Sparse and truncated suffix trees on variable-length codes

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
String matching with alphabet sampling

Journal of Discrete Algorithms
Improving tweet stream classification by detecting changes in word probability

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Sparse suffix tree construction in small space

ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Surprisingly enough, it is not yet known how to build directly a suffix array that indexes just the k positions at word-boundaries of a text T[1, n], taking O(n) time and O(k) space in addition to T. We propose a class-note solution to this problem that achieves such optimal time and space bounds. Word-based versions of indexes achieving the same time/space bounds were already known for suffix trees [1, 2] and (compact) DAWGs [3,4]. Our solution inherits the simplicity and efficiency of suffix arrays, with respect to such other word-indexes, and thus it foresees applications in word-based approaches to data compression [5] and computational linguistics [6]. To support this, we have run a large set of experiments showing that word-based suffix arrays may be constructed twice as fast as their full-text counterparts, and with a working space as low as 20%. The space reduction of the final word-based suffix array impacts also in their query time (i.e. less random access binary-search steps!), being faster by a factor of up to 3.