Bitlist: new full-text index for low space cost and efficient keyword search

  • Authors:
  • Weixiong Rao;Lei Chen;Pan Hui;Sasu Tarkoma

  • Affiliations:
  • School of Software Engineering, Tongji University, China and Department of Comp. Sci., University of Helsinki, Finland;Department of Comp. Sci. and Eng., Hong Kong University of Sci.and Tech.;Department of Comp. Sci. and Eng., Hong Kong University of Sci.and Tech. and Telekom Innovation Laboratories, Berlin, Germany;Department of Comp. Sci., University of Helsinki, Finland

  • Venue:
  • Proceedings of the VLDB Endowment
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Nowadays Web search engines are experiencing significant performance challenges caused by a huge amount of Web pages and increasingly larger number of Web users. The key issue for addressing these challenges is to design a compact structure which can index Web documents with low space and meanwhile process keyword search very fast. Unfortunately, the current solutions typically separate the space optimization from the search improvement. As a result, such solutions either save space yet with search inefficiency, or allow fast keyword search but with huge space requirement. In this paper, to address the challenges, we propose a novel structure bitlist with both low space requirement and supporting fast keyword search. Specifically, based on a simple and yet very efficient encoding scheme, bitlist uses a single number to encode a set of integer document IDs for low space, and adopts fast bitwise operations for very efficient boolean-based keyword search. Our extensive experimental results on real and synthetic data sets verify that bitlist outperforms the recent proposed solution, inverted list compression [23, 22] by spending 36.71% less space and 61.91% faster processing time, and achieves comparable running time as [8] but with significantly lower space.