A locally adaptive data compression scheme
Communications of the ACM
Software—Practice & Experience
Inverted files versus signature files for text indexing
ACM Transactions on Database Systems (TODS)
Compact pat trees
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Fast and flexible word searching on compressed text
ACM Transactions on Information Systems (TOIS)
A fast string searching algorithm
Communications of the ACM
Information Retrieval: Computational and Theoretical Aspects
Information Retrieval: Computational and Theoretical Aspects
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Modern Information Retrieval
Adding Compression to Block Addressing Inverted Indexes
Information Retrieval
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Inverted Index Compression Using Word-Aligned Binary Codes
Information Retrieval
Super-Scalar RAM-CPU Cache Compression
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Lightweight natural language text compression
Information Retrieval
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Efficient document retrieval in main memory
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Space-efficient static trees and graphs
SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Compressed text indexes: From theory to practice
Journal of Experimental Algorithmics (JEA)
Self-indexing Natural Language
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Practical Rank/Select Queries over Arbitrary Sequences
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Inverted index compression and query processing with optimized document ordering
Proceedings of the 18th international conference on World wide web
An experimental investigation of set intersection algorithms for text searching
Journal of Experimental Algorithmics (JEA)
Scalable techniques for document identifier assignment in inverted indexes
Proceedings of the 19th international conference on World wide web
Dynamic lightweight text compression
ACM Transactions on Information Systems (TOIS)
Compact set representation for information retrieval
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
A compressed self-indexed representation of XML documents
ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries
Efficient set intersection for inverted indexing
ACM Transactions on Information Systems (TOIS)
Engineering basic algorithms of an in-memory text search engine
ACM Transactions on Information Systems (TOIS)
Compressed self-indices supporting conjunctive queries on document collections
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Enhanced byte codes with restricted prefix properties
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Ranked document retrieval in (almost) no space
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Effects of Terms Recognition Mistakes on Requests Processing for Interactive Information Retrieval
International Journal of Information Retrieval Research
Hi-index | 0.00 |
Word-based byte-oriented compression has succeeded on large natural language text databases, by providing competitive compression ratios, fast random access, and direct sequential searching. We show that by just rearranging the target symbols of the compressed text into a tree-shaped structure, and using negligible additional space, we obtain a new implicitly indexed representation of the compressed text, where search times are drastically improved. The occurrences of a word can be listed directly, without any text scanning, and in general any inverted-index-like capability, such as efficient phrase searches, can be emulated without storing any inverted list information. We experimentally show that our proposal performs not only much more efficiently than sequential searches over compressed text, but also than explicit inverted indexes and other types of indexes, when using little extra space. Our representation is especially successful when searching for single words and short phrases.