Versioning a full-text information retrieval system
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Fast and flexible word searching on compressed text
ACM Transactions on Information Systems (TOIS)
Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Application of Lempel-Ziv Factorization to the Approximation of Grammar-Based Compression
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
New text indexing functionalities of the compressed suffix arrays
Journal of Algorithms
Inverted Index Compression Using Word-Aligned Binary Codes
Information Retrieval
Super-Scalar RAM-CPU Cache Compression
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Inverted files for text search engines
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Compressed Text Indexes with Fast Locate
CPM '07 Proceedings of the 18th annual symposium on Combinatorial Pattern Matching
Self-indexing Natural Language
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Self-indexed Text Compression Using Straight-Line Programs
MFCS '09 Proceedings of the 34th International Symposium on Mathematical Foundations of Computer Science 2009
Compact full-text indexing of versioned document collections
Proceedings of the 18th ACM conference on Information and knowledge management
Scalable techniques for document identifier assignment in inverted indexes
Proceedings of the 19th international conference on World wide web
Compact set representation for information retrieval
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
LZ77-Like Compression with Fast Random Access
DCC '10 Proceedings of the 2010 Data Compression Conference
Compressed q-Gram Indexing for Highly Repetitive Biological Sequences
BIBE '10 Proceedings of the 2010 IEEE International Conference on Bioinformatics and Bioengineering
Improved index compression techniques for versioned document collections
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Faster adaptive set intersections for text searching
WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
Indexing shared content in information retrieval systems
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Optimizing positional index structures for versioned document collections
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Improved grammar-based compressed indexes
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
On compressing and indexing repetitive sequences
Theoretical Computer Science
Hi-index | 0.00 |
We introduce new compressed inverted indexes for highly repetitive document collections. They are based on run-length, Lempel-Ziv, or grammar-based compression of the differential inverted lists, instead of gap-encoding them as is the usual practice. We show that our compression methods significantly reduce the space achieved by classical compression, at the price of moderate slowdowns. Moreover, many of our methods are universal, that is, they do not need to know the versioning structure of the collection. We also introduce compressed self-indexes in the comparison. We show that techniques can compress much further, using a small fraction of the space required by our new inverted indexes, yet they are orders of magnitude slower.