Enhanced byte codes with restricted prefix properties

Authors:
J. Shane Culpepper;Alistair Moffat
Affiliations:
NICTA Victoria Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, Victoria, Australia;NICTA Victoria Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, Victoria, Australia
Venue:
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Year:
2005

Citing 8
Cited 15

Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Fast and flexible word searching on compressed text

ACM Transactions on Information Systems (TOIS)
Compression of inverted indexes For fast query evaluation

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
String Matching with Stopper Encoding and Code Splitting

CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Hybrid Prefix Codes for Practical Use

DCC '03 Proceedings of the Conference on Data Compression
Optimal Alphabet Partitioning for Semi-Adaptive Coding of Sources of Unknown Sparse Distributions

DCC '03 Proceedings of the Conference on Data Compression
Efficiently decodable and searchable natural language adaptive compression

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An efficient compression code for text databases

ECIR'03 Proceedings of the 25th European conference on IR research

Improved Word-Aligned Binary Compression for Text Indexing

IEEE Transactions on Knowledge and Data Engineering
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Reorganizing compressed text

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
New adaptive compressors for natural language text

Software—Practice & Experience
Reducing Space Requirements for Disk Resident Suffix Arrays

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Dynamic lightweight text compression

ACM Transactions on Information Systems (TOIS)
A compressed self-indexed representation of XML documents

ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries
An efficient implementation of a flexible XPath extension

RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Word-based self-indexes for natural language text

ACM Transactions on Information Systems (TOIS)
Phrase-Based pattern matching in compressed text

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Revisiting bounded context block-sorting transformations

Software—Practice & Experience
Efficient in-memory top-k document retrieval

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
ODC: Frame for definition of Dense codes

European Journal of Combinatorics
Implicit indexing of natural language text by reorganizing bytecodes

Information Retrieval
Tight and simple Web graph compression for forward and reverse neighbor queries

Discrete Applied Mathematics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Byte codes have a number of properties that make them attractive for practical compression systems: they are relatively easy to construct; they decode quickly; and they can be searched using standard byte-aligned string matching techniques. In this paper we describe a new type of byte code in which the first byte of each codeword completely specifies the number of bytes that comprise the suffix of the codeword. Our mechanism gives more flexible coding than previous constrained byte codes, and hence better compression. The structure of the code also suggests a heuristic approximation that allows savings to be made in the prelude that describes the code. We present experimental results that compare our new method with previous approaches to byte coding, in terms of both compression effectiveness and decoding throughput speeds.