Experimenting lucene index on HBase in an HPC environment

Authors:
Xiaoming Gao;Vaibhav Nachankar;Judy Qiu
Affiliations:
Indiana University, Bloomington, IN, USA;Indiana University, Bloomington, IN, USA;Indiana University, Bloomington, IN, USA
Venue:
Proceedings of the first annual workshop on High performance computing meets databases
Year:
2011

Citing 3
Cited 0

Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
The Hadoop Distributed File System

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Apache hadoop goes realtime at Facebook

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data intensive computing has been a major focus of scientific computing communities in the past several years, and many technologies and systems have been developed to efficiently store and serve terabytes or even petabytes of data. One important effort in this direction is the HBase system. Modeled after Google's BigTable, HBase supports reliable storage and efficient access to billions of rows of structured data. However, it does not provide an efficient searching mechanism based on column values. To achieve efficient search on text data, this paper proposes a searching framework based on Lucene full-text indices implemented as HBase tables. Leveraging the distributed architecture of HBase, we expect to get high performance and availability, and excellent scalability and flexibility for our searching system. Our experiments are based on data from a real digital library application and carried out on a dynamically constructed HBase deployment in a high-performance computing (HPC) environment. We have completed system design and data loading tasks of this project, and will cover index building and performance tests in future work.