An overview of query optimization in relational systems
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Building a high-level dataflow system on top of Map-Reduce: the Pig experience
Proceedings of the VLDB Endowment
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
Optimizing joins in a map-reduce environment
Proceedings of the 13th International Conference on Extending Database Technology
Visual, Log-Based Causal Tracing for Performance Debugging of MapReduce Systems
ICDCS '10 Proceedings of the 2010 IEEE 30th International Conference on Distributed Computing Systems
YSmart: Yet Another SQL-to-MapReduce Translator
ICDCS '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems
Query optimization for massively parallel data processing
Proceedings of the 2nd ACM Symposium on Cloud Computing
A Practical Performance Model for Hadoop MapReduce
CLUSTERW '12 Proceedings of the 2012 IEEE International Conference on Cluster Computing Workshops
Hi-index | 0.00 |
There have been recently quite a few works on optimizing the MapReduce execution plans, which either optimize the join operators or apply a set of translation rules to reduce the number of MapReduce jobs in an execution plan. However, none of these works has put into consideration and utilized how MapReduce jobs are generated and combined. To further improve the efficiency of MapReduce execution plans, we incorporate into our optimization approach the way how MapReduce jobs are generated and combined. In this paper, we propose MRPacker, a novel SQL-to-MapReduce optimizer by (a) using a set of transformation rules to reduce the number of MapReduce jobs, and (b) merging MapReduce jobs in a more reasonable way. We have finally experimentally demonstrated the effectiveness and efficiency of MRPacker, using the TPC-H benchmark.