Google's MapReduce programming model — Revisited
Science of Computer Programming
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
ComMapReduce: An improvement of MapReduce with lightweight communication mechanisms
Data & Knowledge Engineering
Hi-index | 0.00 |
MapReduce has excellent scalability and fault-tolerance. It fits well with dominant distributed architectures of today, such as cluster or Grid, which are usually shared-nothing computing environments. However, using MapReduce for data analysis application still meets some challenges, since MapReduce is a low-level procedural programming paradigm and it does not directly support relational algebraic operators. In this work, we addressed a typical data analytic query, multiple group-by query. We parallelized the calculations involved in this type of query with MapReduce, and we introduced indexation and data partition in our work. We measured the speedup performance for implementations over both horizontally partitioned data and vertically partitioned data. We analysed the performance affecting factors from both measurement and formal estimation during this procedure.