Enhancing query support in HBase Via An Extended Coprocessors Framework

Authors:
Himanshu Vashishtha;Eleni Stroulia
Affiliations:
Department of Computing Science, University of Alberta;Department of Computing Science, University of Alberta
Venue:
ServiceWave'11 Proceedings of the 4th European conference on Towards a service-based internet
Year:
2011

Citing 4
Cited 0

MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Bigtable: A Distributed Storage System for Structured Data

ACM Transactions on Computer Systems (TOCS)
Leveraging a scalable row store to build a distributed text index

Proceedings of the first international workshop on Cloud data management
Distributed indexing of web scale datasets for the cloud

Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud

Quantified Score

Hi-index	0.00

Visualization

Abstract

Currently, cloud databases serve as mainstream data storage mechanism for unstructured data, primarily because of their high scalability and ease of availability. However, as yet, they lag behind RDBMs in terms of their support to developers for querying the data. The problem of developing frameworks to support flexible data queries is a very active area of research. In this work we consider HBase, a popular cloud database, inspired by Google's BigTable. Relying on the recent Coprocessor feature of HBase, we have developed a framework that developers can use to implement aggregate functions like row count, max, min, etc. We further extended the existing Coprocessor framework to support a Cursor functionality, so that a client can incrementally consume the Coprocessor generated result. We demonstrate the effectiveness of our extension by comparatively evaluating it against the existing Scanner API with four queries on three different data sets.