Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Fast and flexible word searching on compressed text
ACM Transactions on Information Systems (TOIS)
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
String Matching with Stopper Encoding and Code Splitting
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Hybrid Prefix Codes for Practical Use
DCC '03 Proceedings of the Conference on Data Compression
Optimal Alphabet Partitioning for Semi-Adaptive Coding of Sources of Unknown Sparse Distributions
DCC '03 Proceedings of the Conference on Data Compression
Efficiently decodable and searchable natural language adaptive compression
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An efficient compression code for text databases
ECIR'03 Proceedings of the 25th European conference on IR research
Improved Word-Aligned Binary Compression for Text Indexing
IEEE Transactions on Knowledge and Data Engineering
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
New adaptive compressors for natural language text
Software—Practice & Experience
Reducing Space Requirements for Disk Resident Suffix Arrays
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Dynamic lightweight text compression
ACM Transactions on Information Systems (TOIS)
A compressed self-indexed representation of XML documents
ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries
An efficient implementation of a flexible XPath extension
RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Word-based self-indexes for natural language text
ACM Transactions on Information Systems (TOIS)
Phrase-Based pattern matching in compressed text
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Revisiting bounded context block-sorting transformations
Software—Practice & Experience
Efficient in-memory top-k document retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
ODC: Frame for definition of Dense codes
European Journal of Combinatorics
Implicit indexing of natural language text by reorganizing bytecodes
Information Retrieval
Tight and simple Web graph compression for forward and reverse neighbor queries
Discrete Applied Mathematics
Hi-index | 0.00 |
Byte codes have a number of properties that make them attractive for practical compression systems: they are relatively easy to construct; they decode quickly; and they can be searched using standard byte-aligned string matching techniques. In this paper we describe a new type of byte code in which the first byte of each codeword completely specifies the number of bytes that comprise the suffix of the codeword. Our mechanism gives more flexible coding than previous constrained byte codes, and hence better compression. The structure of the code also suggests a heuristic approximation that allows savings to be made in the prelude that describes the code. We present experimental results that compare our new method with previous approaches to byte coding, in terms of both compression effectiveness and decoding throughput speeds.