The log-structured merge-tree (LSM-tree)
Acta Informatica
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
ACM Transactions on Internet Technology (TOIT)
Building a distributed full-text index for the web
ACM Transactions on Information Systems (TOIS)
GPFS: A Shared-Disk File System for Large Computing Clusters
FAST '02 Proceedings of the Conference on File and Storage Technologies
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Dynamo: amazon's highly available key-value store
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
The Chubby lock service for loosely-coupled distributed systems
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Optimized query execution in large search engines with global page ordering
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient bulk insertion into a distributed ordered table
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
PNUTS: Yahoo!'s hosted data serving platform
Proceedings of the VLDB Endowment
Distributed indexing of web scale datasets for the cloud
Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
Enhancing query support in HBase Via An Extended Coprocessors Framework
ServiceWave'11 Proceedings of the 4th European conference on Towards a service-based internet
Hi-index | 0.00 |
Many content-oriented applications require a scalable text index. Building such an index is challenging. In addition to the logic of inserting and searching documents, developers have to worry about issues in a typical distributed environment, such as fault tolerance, incrementally growing the index cluster, and load balancing. We developed a distributed text index called HIndex, by judiciously exploiting the control layer of HBase, which is an open source implementation of Google's Bigtable. Such leverage enables us to inherit the support on availability, elasticity and load balancing in HBase. We present the design, implementation, and a performance evaluation of HIndex in this paper.