Unique-order interpolative coding for fast querying and space-efficient indexing in information retrieval systems

Authors:
Cher-Sheng Cheng;Jean Jyh-Jiun Shann;Chung-Ping Chung
Affiliations:
Department of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C.;Department of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C.;Department of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan, R. O. C.
Venue:
Information Processing and Management: an International Journal
Year:
2006

Citing 16
Cited 0

Access methods for text

ACM Computing Surveys (CSUR) - Annals of discrete mathematics, 24
Data structures using C

Data structures using C
Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Parameterised compression for sparse bitmaps

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Inverted files versus signature files for text indexing

ACM Transactions on Database Systems (TODS)
Compressed inverted files with reduced decoding overheads

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Information retrieval on the web

ACM Computing Surveys (CSUR)
Compression of inverted indexes For fast query evaluation

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Binary Interpolative Coding for Effective Index Compression

Information Retrieval
Indexing and Retrieval for Genomic Databases

IEEE Transactions on Knowledge and Data Engineering
Compressing Inverted Files

Information Retrieval
Inverted file compression through document identifier reassignment

Information Processing and Management: an International Journal
A Unique-Order Interpolative Code for Fast Querying and Space-Efficient Indexing in Information Retrieval Systems

ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
Inverted Index Compression Using Word-Aligned Binary Codes

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a size reduction method for the inverted file, the most suitable indexing structure for an information retrieval system (IRS). We notice that in an inverted file the document identifiers for a given word are usually clustered. While this clustering property can be used in reducing the size of the inverted file, good compression as well as fast decompression must both be available. In this paper, we present a method that can facilitate coding and decoding processes for interpolative coding using recursion elimination and loop unwinding. We call this method the unique-order interpolative coding. It can calculate the lower and upper bounds of every document identifier for a binary code without using a recursive process, hence the decompression time can be greatly reduced. Moreover, it also can exploit document identifier clustering to compress the inverted file efficiently. Compared with the other well-known compression methods, our method provides fast decoding speed and excellent compression. This method can also be used to support a self-indexing strategy. Therefore our research work in this paper provides a feasible way to build a fast and space-economical IRS.