Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Guidelines for presentation and comparison of indexing techniques
ACM SIGMOD Record
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Word-based block-sorting text compression
ACSC '01 Proceedings of the 24th Australasian conference on Computer science
Nearest common ancestors: a survey and a new distributed algorithm
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
COCOON '96 Proceedings of the Second Annual International Conference on Computing and Combinatorics
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Breaking a Time-and-Space Barrier in Constructing Full-Text Indices
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
MARSYAS: a framework for audio analysis
Organised Sound
Replacing suffix trees with enhanced suffix arrays
Journal of Discrete Algorithms - SPIRE 2002
Handbook of Computational Molecular Biology (Chapman & All/Crc Computer and Information Science Series)
ACM Computing Surveys (CSUR)
Linear work suffix array construction
Journal of the ACM (JACM)
Note: A simple storage scheme for strings achieving entropy bounds
Theoretical Computer Science
An efficient, versatile approach to suffix sorting
Journal of Experimental Algorithmics (JEA)
On-line construction of compact directed acyclic word graphs
Discrete Applied Mathematics
On-Line linear-time construction of word suffix trees
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Sparse directed acyclic word graphs
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
A new succinct representation of RMQ-information and improvements in the enhanced suffix array
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Speeding Up Pattern Matching by Text Sampling
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Index structures for efficiently searching natural language text
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Engineering basic algorithms of an in-memory text search engine
ACM Transactions on Information Systems (TOIS)
Sparse and truncated suffix trees on variable-length codes
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
String matching with alphabet sampling
Journal of Discrete Algorithms
Improving tweet stream classification by detecting changes in word probability
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Sparse suffix tree construction in small space
ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part I
Hi-index | 0.00 |
Surprisingly enough, it is not yet known how to build directly a suffix array that indexes just the k positions at word-boundaries of a text T[1, n], taking O(n) time and O(k) space in addition to T. We propose a class-note solution to this problem that achieves such optimal time and space bounds. Word-based versions of indexes achieving the same time/space bounds were already known for suffix trees [1, 2] and (compact) DAWGs [3,4]. Our solution inherits the simplicity and efficiency of suffix arrays, with respect to such other word-indexes, and thus it foresees applications in word-based approaches to data compression [5] and computational linguistics [6]. To support this, we have run a large set of experiments showing that word-based suffix arrays may be constructed twice as fast as their full-text counterparts, and with a working space as low as 20%. The space reduction of the final word-based suffix array impacts also in their query time (i.e. less random access binary-search steps!), being faster by a factor of up to 3.