Compressing inverted files in scalable information systems by binary decision diagram encoding

  • Authors:
  • Chung-Hung Lai;Tien-Fu Chen

  • Affiliations:
  • National Chung Cheng University, Chiayi, Taiwan;National Chung Cheng University, Chiayi, Taiwan

  • Venue:
  • Proceedings of the 2001 ACM/IEEE conference on Supercomputing
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the key challenges of managing very huge volumes of data in scalable Information retrieval systems is providing fast access through keyword searches. The major data structure in the information retrieval system is an inverted file, which records the positions of each term in the documents. When the information set substantially grows, the number of terms and documents are significantly increased as well as the size of the inverted files.Approaches to reduce the inverted file without sacrificing the query efficiency are important to the success of scalable information systems. In this paper, we propose a compression approach by using Binary Decision Diagram Encoding (BDD) so that all possible ordering correlation among large amount of documents will be extracted to minimize the posting representation. Another advantage of using BDD is that BDD expressions can efficiently perform Boolean queries, which are very common in retrieval systems. Experiment results show that the compression ratios of the inverted files have been improved significantly by the BDD scheme.