Fast multi-fields query processing in bigtable based cloud systems

Authors:
Haiping Wang;Xiang Ci;Xiaofeng Meng
Affiliations:
School of Information, Renmin University, Beijing, China;School of Information, Renmin University, Beijing, China;School of Information, Renmin University, Beijing, China
Venue:
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Year:
2013

Citing 8
Cited 0

Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Asynchronous view maintenance for VLSD databases

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
An efficient multi-dimensional index for cloud data management

Proceedings of the first international workshop on Cloud data management
Indexing multi-dimensional data in a cloud system

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
CCIndex: a complemental clustering index on distributed ordered tables for multi-dimensional range queries

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
An efficient quad-tree based index structure for cloud data management

WAIM'11 Proceedings of the 12th international conference on Web-age information management
A-Tree: Distributed Indexing of Multidimensional Data for Cloud Computing Environments

CLOUDCOM '11 Proceedings of the 2011 IEEE Third International Conference on Cloud Computing Technology and Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the rapid increase of data sizes, enterprise applications are migrating their backend data management and analytic systems into cloud based data management systems.Bigtable is among one of the major data models used by cloud storage systems as their storage layer. Such systems provide high scalability and schema flexibility, and support efficient point and range based queries based on rowkeys. However, Bigtable based systems have limited support on non-rowkey based queries and multiple-fields based queries, due to much overhead on invoking extra scanning of data. In this paper, we develop a system TNBGR(Telecom Network Browsing Gateway Records) on managing and querying large scale telecommunication data. TNBGR is built on top of HBase and MapReduce, with a focus on optimizing multi-fields query processing. TNBGR provides a novel application and system resource aware data allocation strategy to minimize data access through multi-layer region partitioning, resource parameterization, and balanced region distribution.The query composition dynamically updates application parameters based on tracked system statistics and automatically translates queries for MapReduce. Through additional query optimization by improving region locality, TNBGR achieves high efficiency on supporting multi-field queries. The experimental results show that our solution improves the performance of the queries by about 5 and 18 times respectively.