Leveraging a scalable row store to build a distributed text index

Authors:
Ning Li;Jun Rao;Eugene Shekita;Sandeep Tata
Affiliations:
Facebook, Palo Alto, CA, USA;IBM Almaden Research Center, San Jose, CA, USA;IBM Almaden Research Center, San Jose, CA, USA;IBM Almaden Research Center, San Jose, CA, USA
Venue:
Proceedings of the first international workshop on Cloud data management
Year:
2009

Citing 14
Cited 2

The log-structured merge-tree (LSM-tree)

Acta Informatica
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Searching the Web

ACM Transactions on Internet Technology (TOIT)
Building a distributed full-text index for the web

ACM Transactions on Information Systems (TOIS)
GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the Conference on File and Storage Technologies
Web Search for a Planet: The Google Cluster Architecture

IEEE Micro
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
The Chubby lock service for loosely-coupled distributed systems

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Optimized query execution in large search engines with global page ordering

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient bulk insertion into a distributed ordered table

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
PNUTS: Yahoo!'s hosted data serving platform

Proceedings of the VLDB Endowment

Distributed indexing of web scale datasets for the cloud

Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
Enhancing query support in HBase Via An Extended Coprocessors Framework

ServiceWave'11 Proceedings of the 4th European conference on Towards a service-based internet

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many content-oriented applications require a scalable text index. Building such an index is challenging. In addition to the logic of inserting and searching documents, developers have to worry about issues in a typical distributed environment, such as fault tolerance, incrementally growing the index cluster, and load balancing. We developed a distributed text index called HIndex, by judiciously exploiting the control layer of HBase, which is an open source implementation of Google's Bigtable. Such leverage enables us to inherit the support on availability, elasticity and load balancing in HBase. We present the design, implementation, and a performance evaluation of HIndex in this paper.