Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Efficient implementation of suffix trees
Software—Practice & Experience
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
An experimental study of an opportunistic index
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
An analysis of the Burrows—Wheeler transform
Journal of the ACM (JACM)
Succinct representations of lcp information and improvements in the compressed suffix arrays
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Adding Compression to Block Addressing Inverted Indexes
Information Retrieval
Indexing and Retrieval for Genomic Databases
IEEE Transactions on Knowledge and Data Engineering
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
COCOON '96 Proceedings of the Second Annual International Conference on Computing and Combinatorics
A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays
COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Filtration with q-Samples in Approximate String Matching
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
Succinct representation of balanced parentheses, static trees and planar graphs
FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Breaking a Time-and-Space Barrier in Constructing Full-Text Indices
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Indexing text using the Ziv-Lempel trie
Journal of Discrete Algorithms - SPIRE 2002
New text indexing functionalities of the compressed suffix arrays
Journal of Algorithms
Journal of the ACM (JACM)
Boosting textual compression in optimal linear time
Journal of the ACM (JACM)
LZgrep: a Boyer–Moore string matching tool for Ziv–Lempel compressed text: Research Articles
Software—Practice & Experience
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
SIAM Journal on Computing
Structuring labeled trees for optimal succinctness, and beyond
FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Compressing and searching XML data via two zips
Proceedings of the 15th international conference on World Wide Web
Succinct suffix arrays based on run-length encoding
Nordic Journal of Computing
When indexing equals compression: Experiments with compressing suffix arrays and applications
ACM Transactions on Algorithms (TALG)
ACM Computing Surveys (CSUR)
Note: A simple storage scheme for strings achieving entropy bounds
Theoretical Computer Science
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
A taxonomy of suffix array construction algorithms
ACM Computing Surveys (CSUR)
GLIMPSE: a tool to search through entire file systems
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Theoretical Computer Science
Dynamic entropy-compressed sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Word-Based Statistical Compressors as Natural Language Compression Boosters
DCC '08 Proceedings of the Data Compression Conference
On Self-Indexing Images - Image Compression with Added Value
DCC '08 Proceedings of the Data Compression Conference
Compressed Text Indexes with Fast Locate
CPM '07 Proceedings of the 18th annual symposium on Combinatorial Pattern Matching
Encyclopedia of Algorithms
Succinct representations of permutations
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Statistical encoding of succinct data structures
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Reducing the space requirement of LZ-Index
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Inverted files versus suffix arrays for locating patterns in primary memory
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Space-efficient construction of LZ-index
ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Suffix trays and suffix trists: structures for faster text indexing
ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part I
Compressing and indexing labeled trees, with applications
Journal of the ACM (JACM)
Compressed Suffix Arrays for Massive Data
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Directly Addressable Variable-Length Codes
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Faster entropy-bounded compressed suffix trees
Theoretical Computer Science
Ontologies and semantic mining for bio-technology and chemistry data and patents
Proceedings of the 2nd international workshop on Patent information retrieval
On compressing the textual web
Proceedings of the third ACM international conference on Web search and data mining
A web search engine model based on index-query bit-level compression
Proceedings of the 1st International Conference on Intelligent Semantic Web-Services and Applications
Sampled longest common prefix array
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Compression, indexing, and retrieval for massive string data
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Data structures: time, I/Os, entropy, joules!
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Practical approaches to reduce the space requirement of lempel-ziv--based compressed text indices
Journal of Experimental Algorithmics (JEA)
Spatio-temporal range searching over compressed kinetic sensor data
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part I
Compressed self-indices supporting conjunctive queries on document collections
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Space-efficient construction of Lempel-Ziv compressed text indexes
Information and Computation
Space-efficient substring occurrence estimation
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Compressed string dictionaries
SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Practical compressed document retrieval
SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Distribution-aware compressed full-text indexes
ESA'11 Proceedings of the 19th European conference on Algorithms
Fixed block compression boosting in FM-indexes
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Space efficient wavelet tree construction
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Word-based self-indexes for natural language text
ACM Transactions on Information Systems (TOIS)
Practical compressed suffix trees
SEA'10 Proceedings of the 9th international conference on Experimental Algorithms
String matching with alphabet sampling
Journal of Discrete Algorithms
Revisiting bounded context block-sorting transformations
Software—Practice & Experience
Efficient in-memory top-k document retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
To index or not to index: time-space trade-offs in search engines with positional ranking functions
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Efficient indexing algorithms for approximate pattern matching in text
Proceedings of the Seventeenth Australasian Document Computing Symposium
Compressed suffix trees for repetitive texts
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Implicit indexing of natural language text by reorganizing bytecodes
Information Retrieval
Development of a Novel Compressed Index-Query Web Search Engine Model
International Journal of Information Technology and Web Engineering
Spaces, Trees, and Colors: The algorithmic landscape of document retrieval on sequences
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
A compressed full-text self-index represents a text in a compressed form and still answers queries efficiently. This represents a significant advancement over the (full-)text indexing techniques of the previous decade, whose indexes required several times the size of the text. Although it is relatively new, this algorithmic technology has matured up to a point where theoretical research is giving way to practical developments. Nonetheless this requires significant programming skills, a deep engineering effort, and a strong algorithmic background to dig into the research results. To date only isolated implementations and focused comparisons of compressed indexes have been reported, and they missed a common API, which prevented their re-use or deployment within other applications. The goal of this article is to fill this gap. First, we present the existing implementations of compressed indexes from a practitioner's point of view. Second, we introduce the Pizza&Chili site, which offers tuned implementations and a standardized API for the most successful compressed full-text self-indexes, together with effective test-beds and scripts for their automatic validation and test. Third, we show the results of our extensive experiments on these codes with the aim of demonstrating the practical relevance of this novel algorithmic technology.