A bridging model for parallel computation
Communications of the ACM
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Compact pat trees
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
An analysis of the Burrows—Wheeler transform
Journal of the ACM (JACM)
Time-space trade-offs for compressed suffix arrays
Information Processing Letters
Succinct representations of lcp information and improvements in the compressed suffix arrays
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array
ISAAC '00 Proceedings of the 11th International Conference on Algorithms and Computation
Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Succinct static data structures
Succinct static data structures
Compact suffix array: a space-efficient full-text index
Fundamenta Informaticae - Special issue on computing patterns in strings
When indexing equals compression: experiments with compressing suffix arrays and applications
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Indexing text using the Ziv-Lempel trie
Journal of Discrete Algorithms - SPIRE 2002
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Journal of the ACM (JACM)
Succinct suffix arrays based on run-length encoding
Nordic Journal of Computing
ACM Computing Surveys (CSUR)
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Algorithms and data structures for external memory
Foundations and Trends® in Theoretical Computer Science
Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Compression, indexing, and retrieval for massive string data
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Parallel and distributed compressed indexes
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Inverted files versus suffix arrays for locating patterns in primary memory
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
A Lempel-Ziv text index on secondary storage
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Compressed text indexes with fast locate
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Hi-index | 0.00 |
One of the most relevant succinct suffix array proposals in the literature is the Compressed Suffix Array (CSA) of Sadakane [ISAAC 2000] The CSA needs n(H0+O(log logσ)) bits of space, where n is the text size, σ is the alphabet size, and H0 the zero-order entropy of the text The number of occurrences of a pattern of length m can be computed in O(mlog n) time Most notably, the CSA does not need the text separately available to operate The CSA simulates a binary search over the suffix array, where the query is compared against text substrings These are extracted from the same CSA by following irregular access patterns over the structure Sadakane [SODA 2002] has proposed using backward searching on the CSA in similar fashion as the FM-index of Ferragina and Manzini [FOCS 2000] He has shown that the CSA can be searched in O(m) time whenever σ = O(polylog(n)). In this paper we consider some other consequences of backward searching applied to CSA The most remarkable one is that we do not need, unlike all previous proposals, any complicated sub-linear structures based on the four-Russians technique (such as constant time rank and select queries on bit arrays) We show that sampling and compression are enough to achieve O(mlog n) query time using less space than the original structure It is also possible to trade structure space for search time Furthermore, the regular access pattern of backward searching permits an efficient secondary memory implementation, so that the search can be done with O(m logBn) disk accesses, being B the disk block size Finally, it permits a distributed implementation with optimal speedup and negligible communication effort.