A locally adaptive data compression scheme
Communications of the ACM
Bonsai: a compact representation of trees
Software—Practice & Experience
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Arithmetic coding for data compression
Communications of the ACM
Let sleeping files lie: pattern matching in Z-compressed files
Journal of Computer and System Sciences
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
Fast searching on compressed text allowing errors
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Fast algorithms for sorting and searching strings
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
An analysis of the Burrows—Wheeler transform
Journal of the ACM (JACM)
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient Index Structures for String Databases
Proceedings of the 27th International Conference on Very Large Data Bases
A Fast Index for Semistructured Data
Proceedings of the 27th International Conference on Very Large Data Bases
Optimal Exact Strring Matching Based on Suffix Arrays
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Indexing Text Using the Ziv-Lempel Trie
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Trade Off Between Compression and Search Times in Compact Suffix Array
ALENEX '01 Revised Papers from the Third International Workshop on Algorithm Engineering and Experimentation
A Compressed Breadth-First Search for Satisfiability
ALENEX '02 Revised Papers from the 4th International Workshop on Algorithm Engineering and Experiments
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Engineering a Lightweight Suffix Array Construction Algorithm
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Searching BWT Compressed Text with the Boyer-Moore Algorithm and Binary Search
DCC '02 Proceedings of the Data Compression Conference
Compact suffix array: a space-efficient full-text index
Fundamenta Informaticae - Special issue on computing patterns in strings
When indexing equals compression: experiments with compressing suffix arrays and applications
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Replacing suffix trees with enhanced suffix arrays
Journal of Discrete Algorithms - SPIRE 2002
Indexing text using the Ziv-Lempel trie
Journal of Discrete Algorithms - SPIRE 2002
New text indexing functionalities of the compressed suffix arrays
Journal of Algorithms
Resolution cannot polynomially simulate compressed-BFS
Annals of Mathematics and Artificial Intelligence
A categorization theorem on suffix arrays with applications to space efficient text indexes
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Pattern Matching in LZW Compressed Files
IEEE Transactions on Computers
Suffix arrays: what are they good for?
ADC '06 Proceedings of the 17th Australasian Database Conference - Volume 49
Succinct suffix arrays based on run-length encoding
Nordic Journal of Computing
When indexing equals compression: Experiments with compressing suffix arrays and applications
ACM Transactions on Algorithms (TALG)
ACM Computing Surveys (CSUR)
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Implementing the LZ-index: Theory versus practice
Journal of Experimental Algorithmics (JEA)
Compressed text indexes: From theory to practice
Journal of Experimental Algorithmics (JEA)
Cell probe lower bounds for succinct data structures
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Dependability Improvement for PPM Compressed Data by Using Compression Pattern Matching
IEICE - Transactions on Information and Systems
Efficient construction of FM-index using overlapping block processing for large scale texts
ECIR'07 Proceedings of the 29th European conference on IR research
An experimental study of compressed indexing and local alignments of DNA
COCOA'07 Proceedings of the 1st international conference on Combinatorial optimization and applications
Indexing similar DNA sequences
AAIM'10 Proceedings of the 6th international conference on Algorithmic aspects in information and management
Space-efficient construction of LZ-index
ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Succinct suffix arrays based on run-length encoding
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Efficient implementation of rank and select functions for succinct representation
WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
Compact Suffix Array — A Space-Efficient Full-Text Index
Fundamenta Informaticae - Computing Patterns in Strings
WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
Hardware acceleration of genetic sequence alignment
ARC'13 Proceedings of the 9th international conference on Reconfigurable Computing: architectures, tools, and applications
Hi-index | 0.00 |
The size of electronic data is currently growing at a faster rate than computer memory and disk storage capacities. For this reason compression appears always as an attractive choice, if not mandatory. However space overhead is not the only resource to be optimized when managing large data collections; in fact data turn out to be useful only when properly indexed to support search operations that efficiently extract the user-requested information.Approaches to combine compression and indexing techniques are nowadays receiving more and more attention. A first step towards the design of a compressed full-text index achieving guaranteed performance in the worst case has been recently done in [10]. This index combines the compression algorithm proposed by Burrows and Wheeler [5] with the suffix array data structure [16]. The index is opportunistic in that it takes advantage of the compressibility of the input data by decreasing the space occupancy at no significant asymptotic slowdown in the query performance.In this paper we present an implementation of this index and perform an extensive set of experiments on various text collections. The experiments show that our index is compact (its space occupancy is close to the one achieved by the best known compressors), it is fast in counting the number of pattern occurrences, and the cost of their retrieval is reasonable when they are few (i.e., in case of a selective query). In addition, our experiments show that the FM-index is flexible in that it is possible to trade space occupancy for search time by choosing the amount of auxiliary information stored into it.