Parallel database systems: the future of high performance database systems
Communications of the ACM
Horizontal data partitioning in database design
SIGMOD '82 Proceedings of the 1982 ACM SIGMOD international conference on Management of data
Efficient computation of multiple group by queries
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Google's MapReduce programming model — Revisited
Science of Computer Programming
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
MRPGA: An Extension of MapReduce for Parallelizing Genetic Algorithms
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
MapReduce model is a new parallel programming model initially developed for large-scale web content processing Data analysis meets the issue of how to do calculation over extremely large dataset The arrival of MapReduce provides a chance to utilize commodity hardware for massively parallel data analysis applications The translation and optimization from relational algebra operators to MapReduce programs is still an open and dynamic research field In this paper, we focus on a special type of data analysis query, namely, multiple group by query We first study the communication cost of MapReduce model, then we give an initial implementation of multiple group by query We then propose an optimized version which addresses and improves the communication cost issues Our optimized version shows a better accelerating ability and a better scalability than the other version.