Suffix arrays: what are they good for?
ADC '06 Proceedings of the 17th Australasian Database Conference - Volume 49
When indexing equals compression: Experiments with compressing suffix arrays and applications
ACM Transactions on Algorithms (TALG)
ACM Computing Surveys (CSUR)
Succinct data structures for flexible text retrieval systems
Journal of Discrete Algorithms
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Compressed indexes for dynamic text collections
ACM Transactions on Algorithms (TALG)
A taxonomy of suffix array construction algorithms
ACM Computing Surveys (CSUR)
An efficient, versatile approach to suffix sorting
Journal of Experimental Algorithmics (JEA)
Ultra-succinct representation of ordered trees
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Theoretical Computer Science
Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets
ACM Transactions on Algorithms (TALG)
Theoretical Computer Science
Rank and select revisited and extended
Theoretical Computer Science
Fast blocking of undesirable web pages on client PC by discriminating URL using neural networks
Expert Systems with Applications: An International Journal
Compact dictionaries for variable-length keys and data with applications
ACM Transactions on Algorithms (TALG)
Dynamic entropy-compressed sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
A compressed self-index using a Ziv---Lempel dictionary
Information Retrieval
An(other) Entropy-Bounded Compressed Suffix Tree
CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
On the Redundancy of Succinct Data Structures
SWAT '08 Proceedings of the 11th Scandinavian workshop on Algorithm Theory
Compressed text indexes: From theory to practice
Journal of Experimental Algorithmics (JEA)
Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Cell probe lower bounds for succinct data structures
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Storage and Retrieval of Individual Genomes
RECOMB 2'09 Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology
Broadword Computing and Fibonacci Code Speed Up Compressed Suffix Arrays
SEA '09 Proceedings of the 8th International Symposium on Experimental Algorithms
Text Indexing, Suffix Sorting, and Data Compression: Common Problems and Techniques
CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Engineering a compressed suffix tree implementation
Journal of Experimental Algorithmics (JEA)
Dynamic rank/select structures with applications to run-length encoded texts
Theoretical Computer Science
Compressed Suffix Arrays for Massive Data
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
On Entropy-Compressed Text Indexing in External Memory
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
A Linear-Time Burrows-Wheeler Transform Using Induced Sorting
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Faster entropy-bounded compressed suffix trees
Theoretical Computer Science
Succinct Index for Dynamic Dictionary Matching
ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
Engineering a compressed suffix tree implementation
WEA'07 Proceedings of the 6th international conference on Experimental algorithms
Approximate string matching with Lempel-Ziv compressed indexes
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Sampled longest common prefix array
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Compression, indexing, and retrieval for massive string data
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Practical approaches to reduce the space requirement of lempel-ziv--based compressed text indices
Journal of Experimental Algorithmics (JEA)
Computing the inverse sort transform in linear time
ACM Transactions on Algorithms (TALG)
String retrieval for multi-pattern queries
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Faster compressed dictionary matching
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
The gapped suffix array: a new index structure for fast approximate matching
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Indexing methods for approximate dictionary searching: Comparative analysis
Journal of Experimental Algorithmics (JEA)
Note: Combined data structure for previous- and next-smaller-values
Theoretical Computer Science
Space-efficient construction of Lempel-Ziv compressed text indexes
Information and Computation
A quick tour on suffix arrays and compressed suffix arrays
Theoretical Computer Science
Space-efficient substring occurrence estimation
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Succinct indexes for strings, binary relations and multilabeled trees
ACM Transactions on Algorithms (TALG)
Inverted indexes for phrases and strings
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Indexing finite language representation of population genotypes
WABI'11 Proceedings of the 11th international conference on Algorithms in bioinformatics
Compressed text indexing with wildcards
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Compressed indexes for aligned pattern matching
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Semi-indexing semi-structured data in tiny space
Proceedings of the 20th ACM international conference on Information and knowledge management
A linear size index for approximate pattern matching
Journal of Discrete Algorithms
Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays
SIAM Journal on Computing
Ultra-succinct representation of ordered trees with applications
Journal of Computer and System Sciences
A compressed self-index using a ziv-lempel dictionary
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Efficient Maximal Repeat Finding Using the Burrows-Wheeler Transform and Wavelet Tree
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
The myriad virtues of wavelet trees
ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part I
A randomized numerical aligner (rNA)
LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
Encoding 2d range maximum queries
ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Succinct indexes for circular patterns
ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Worst-case efficient single and multiple string matching on packed texts in the word-RAM model
Journal of Discrete Algorithms
BpMatch: An Efficient Algorithm for a Segmental Analysis of Genomic Sequences
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Towards an optimal space-and-query-time index for top-k document retrieval
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
A randomized Numerical Aligner (rNA)
Journal of Computer and System Sciences
Dynamic rank-select structures with applications to run-length encoded texts
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Fast and practical algorithms for computing all the runs in a string
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Compressed data structures with relevance
Proceedings of the 21st ACM international conference on Information and knowledge management
On position restricted substring searching in succinct space
Journal of Discrete Algorithms
Faster compressed dictionary matching
Theoretical Computer Science
Compressed text indexing with wildcards
Journal of Discrete Algorithms
On compressing and indexing repetitive sequences
Theoretical Computer Science
Space-efficient data structures for Top-k completion
Proceedings of the 22nd international conference on World Wide Web
Searching similar segments over textual event sequences
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Dynamic compressed strings with random access
ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part I
On the combinatorics of suffix arrays
Information Processing Letters
On compressing permutations and adaptive sorting
Theoretical Computer Science
Compressed property suffix trees
Information and Computation
Cross-document pattern matching
Journal of Discrete Algorithms
A Compressed Suffix Tree Based Implementation With Low Peak Memory Usage
Electronic Notes in Theoretical Computer Science (ENTCS)
Journal of Discrete Algorithms
Hi-index | 0.01 |
The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for space-efficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text $T$ consisting of $n$ symbols drawn from a fixed alphabet $\Sigma$. The text $T$ can be represented in $n \lg |\Sigma|$ bits by encoding each symbol with $\lg |\Sigma|$ bits. The goal is to support fast online queries for searching any string pattern $P$ of $m$ symbols, with $T$ being fully scanned only once, namely, when the index is created at preprocessing time. The text indexing schemes published in the literature are greedy in terms of space usage: they require $\Omega(n \lg n)$ additional bits of space in the worst case. For example, in the standard unit cost RAM, suffix trees and suffix arrays need $\Omega(n)$ memory words, each of $\Omega(\lg n)$ bits. These indexes are larger than the text itself by a multiplicative factor of $\Omega(\smash{\lg_{|\Sigma|} n})$, which is significant when $\Sigma$ is of constant size, such as in \textsc{ascii} or \textsc{unicode}. On the other hand, these indexes support fast searching, either in $O(m \lg |\Sigma|)$ time or in $O(m + \lg n)$ time, plus an output-sensitive cost $O(\mathit{occ})$ for listing the $\mathit{occ}$ pattern occurrences. We present a new text index that is based upon compressed representations of suffix arrays and suffix trees. It achieves a fast $\smash{O(m /\lg_{|\Sigma|} n + \lg_{|\Sigma|}^\epsilon n)}$ search time in the worst case, for any constant $0