YSmart: Yet Another SQL-to-MapReduce Translator

Authors:
Rubao Lee;Tian Luo;Yin Huai;Fusheng Wang;Yongqiang He;Xiaodong Zhang
Affiliations:
-;-;-;-;-;-
Venue:
ICDCS '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems
Year:
2011

Citing 0
Cited 16

DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems

Proceedings of the 2nd ACM Symposium on Cloud Computing
High performance spatial query processing for large scale scientific data

PhD '12 Proceedings of the on SIGMOD/PODS 2012 PhD Symposium
Efficient multi-way theta-join processing using MapReduce

Proceedings of the VLDB Endowment
Stubby: a transformation-based optimizer for MapReduce workflows

Proceedings of the VLDB Endowment
AROMA: automated resource allocation and configuration of mapreduce environment in the cloud

Proceedings of the 9th international conference on Autonomic computing
Towards building a high performance spatial query system for large scale medical imaging data

Proceedings of the 20th International Conference on Advances in Geographic Information Systems
Constructing a data accessing layer for in-memory data grid

Proceedings of the Fourth Asia-Pacific Symposium on Internetware
Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling

ACM Transactions on Architecture and Code Optimization (TACO)
QMapper: a tool for SQL optimization on hive using query rewriting

Proceedings of the 22nd international conference on World Wide Web companion
Efficient social network data query processing on MapReduce

Proceedings of the 5th ACM workshop on HotPlanet
Cache conscious star-join in MapReduce environments

Proceedings of the 2nd International Workshop on Cloud Intelligence
MRPacker: an SQL to mapreduce optimizer

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
The Yin and Yang of processing data warehousing queries on GPU devices

Proceedings of the VLDB Endowment
Hadoop GIS: a high performance spatial data warehousing system over mapreduce

Proceedings of the VLDB Endowment
Understanding insights into the basic structure and essential issues of table placement methods in clusters

Proceedings of the VLDB Endowment
SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce has become an effective approach to big data analytics in large cluster systems, where SQL-like queries play important roles to interface between users and systems. However, based on our Face book daily operation results, certain types of queries are executed at an unacceptable low speed by Hive (a production SQL-to-MapReduce translator). In this paper, we demonstrate that existing SQL-to-MapReduce translators that operate in a one-operation-to-one-job mode and do not consider query correlations cannot generate high-performance MapReduce programs for certain queries, due to the mismatch between complex SQL structures and simple MapReduce framework. We propose and develop a system called Y Smart, a correlation aware SQL-to-MapReduce translator. Y Smart applies a set of rules to use the minimal number of MapReduce jobs to execute multiple correlated operations in a complex query. Y Smart can significantly reduce redundant computations, I/O operations and network transfers compared to existing translators. We have implemented Y Smart with intensive evaluation for complex queries on two Amazon EC2 clusters and one Face book production cluster. The results show that Y Smart can outperform Hive and Pig, two widely used SQL-to-MapReduce translators, by more than four times for query execution.