Ad-hoc aggregate query processing algorithms based on bit-store for query intensive applications in cloud computing

Authors:
Donghua Yang;Yuqiang Feng;Ye Yuan;Xixian Han;Jinbao Wang;Jianzhong Li
Affiliations:
The Academy of Fundamental and Interdisciplinary Sciences, Harbin Institute of Technology, Harbin 150001, China and School of Management, Harbin Institute of Technology, Harbin 150001, China;School of Management, Harbin Institute of Technology, Harbin 150001, China;The Academy of Fundamental and Interdisciplinary Sciences, Harbin Institute of Technology, Harbin 150001, China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Venue:
Future Generation Computer Systems
Year:
2013

Citing 30
Cited 0

Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A decomposition storage model

SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
Parallel data intensive computing in scientific and commercial applications

Parallel Computing - Parallel data-intensive algorithms and applications
Aggregate-Query Processing in Data Warehousing Environments

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Answering Queries with Aggregation Using Views

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Data-intensive e-science frontier research

Communications of the ACM - Blueprint for the future of high-performance networking
Applying Database Support for Large Scale Data Driven Science in Distributed Environments

GRID '03 Proceedings of the 4th International Workshop on Grid Computing
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Petascale Computational Systems

Computer
Integrating compression and execution in column-oriented database systems

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Map-reduce-merge: simplified relational data processing on large clusters

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Column-stores vs. row-stores: how different are they really?

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Compute and storage clouds using wide area high performance networks

Future Generation Computer Systems
Web Semantics in the Clouds

IEEE Intelligent Systems
PNUTS: Yahoo!'s hosted data serving platform

Proceedings of the VLDB Endowment
Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility

Future Generation Computer Systems
A framework for distributed knowledge management: Design and implementation

Future Generation Computer Systems
MapReduce: a flexible data processing tool

Communications of the ACM - Amir Pnueli: Ahead of His Time
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

Proceedings of the VLDB Endowment
Toward dynamic and attribute based publication, discovery and selection for cloud computing

Future Generation Computer Systems
A data placement strategy in scientific cloud workflows

Future Generation Computer Systems
VDB-MR: MapReduce-based distributed data integration using virtual database

Future Generation Computer Systems
Selecting and using views to compute aggregate queries

ICDT'05 Proceedings of the 10th international conference on Database Theory
Adapting scientific computing problems to clouds using MapReduce

Future Generation Computer Systems
HSim: A MapReduce simulator in enabling Cloud Computing

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ad-hoc Aggregate query is extremely important for query intensive applications in cloud computing which extracts valuable summary information on massive datasets to help the decision-maker make right decisions. Current data storage schemes (row-store and column-store) cannot efficiently answer ad-hoc aggregate query on massive data sets in cloud computing. A new data storage structure (bit vector storage structure, bit-store for short) is proposed in this paper. The paper focuses on proposing ad-hoc aggregate query algorithms based on bit-store. Firstly, the storage model of bit-store including its attribute encoding schemes and bit file organization is introduced. Secondly, different aggregate operations for query processing are presented based on different encoding schemes. Thirdly, cost analysis for different aggregate operations is presented. Finally, the effectiveness and efficiency of the proposed algorithms is showed by the analytical and experimental results.