Leveraging a scalable row store to build a distributed text index

  • Authors:
  • Ning Li;Jun Rao;Eugene Shekita;Sandeep Tata

  • Affiliations:
  • Facebook, Palo Alto, CA, USA;IBM Almaden Research Center, San Jose, CA, USA;IBM Almaden Research Center, San Jose, CA, USA;IBM Almaden Research Center, San Jose, CA, USA

  • Venue:
  • Proceedings of the first international workshop on Cloud data management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many content-oriented applications require a scalable text index. Building such an index is challenging. In addition to the logic of inserting and searching documents, developers have to worry about issues in a typical distributed environment, such as fault tolerance, incrementally growing the index cluster, and load balancing. We developed a distributed text index called HIndex, by judiciously exploiting the control layer of HBase, which is an open source implementation of Google's Bigtable. Such leverage enables us to inherit the support on availability, elasticity and load balancing in HBase. We present the design, implementation, and a performance evaluation of HIndex in this paper.