ESQP: an efficient SQL query processing for cloud data management

Authors:
Jing Zhao;Xiangmei Hu;Xiaofeng Meng
Affiliations:
Renmin University of China, Beijing, China;Renmin University of China, Beijing, China;Renmin University of China, Beijing, China
Venue:
CloudDB '10 Proceedings of the second international workshop on Cloud data management
Year:
2010

Citing 10
Cited 2

Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Interpreting the data: Parallel analysis with Sawzall

Scientific Programming - Dynamic Grids and Worldwide Computing
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
DigestJoin: Exploiting Fast Random Reads for Flash-Based Joins

MDM '09 Proceedings of the 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

Proceedings of the VLDB Endowment

Towards non-intrusive elastic query processing in the cloud

Proceedings of the fourth international workshop on Cloud data management
A cloud approach to unified lifecycle data management in architecture, engineering, construction and facilities management: Integrating BIMs and SNS

Advanced Engineering Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, the cloud computing platform is getting more and more attentions as a new trend of data management. Currently there are several cloud computing products that can provide various services. However, most cloud platforms are not designed for structured data management. So they rarely support SQL queries directly. Even though some platforms support SQL queries, their bottoms are traditional relational database, therefore, the cost for executing a subquery in RDBS may influence the overall query performance. How to improve query efficiency in cloud data management system, especially query on structured data has become a more and more important problem. To address the issue, an efficient algorithm about query processing on structured data is proposed. Our approach is inspired by the idea of MapReduce, in which a job is divided into several tasks. Based on the distributed storage of one table, this algorithm divides a user query into different subqueries, at the same time, with replicas in cloud, a subquery is mapped to k+1 subqueries. Every subquery has to wait in the queue of the slave where the query data store. To balance the load, our algorithm also takes two scheduling strategies to dispatch the subquery. Besides, in order to reduce the client's long waiting time, we adopt the pipeline strategy to process result returning. Finally, we demonstrate the efficiency and scalability of our algorithm with kinds of experiments. Our approach is quite general and independent from the underlying infrastructure and can be easily carried over for implementation on various cloud computing platforms.