MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
MapReduce optimization using regulated dynamic prioritization
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Towards automatic optimization of MapReduce programs
Proceedings of the 1st ACM symposium on Cloud computing
MOON: MapReduce On Opportunistic eNvironments
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Reining in the outliers in map-reduce clusters using Mantri
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Pilot-MapReduce: an extensible and flexible MapReduce implementation for distributed data
Proceedings of third international workshop on MapReduce and its Applications Date
Time and Cost Sensitive Data-Intensive Computing on Hybrid Clouds
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Hierarchical MapReduce Programming Model and Scheduling Algorithms
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Network-aware scheduling of mapreduce framework ondistributed clusters over high speed networks
Proceedings of the 2012 workshop on Cloud services, federation, and the 8th open cirrus summit
Electric grid balancing through lowcost workload migration
ACM SIGMETRICS Performance Evaluation Review
Proceedings of the 3rd international workshop on Emerging computational methods for the life sciences
Trustworthy distributed computing on social networks
Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security
A case for MapReduce over the internet
Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
Hi-index | 0.00 |
MapReduce is a highly-popular paradigm for high-performance computing over large data sets in large-scale platforms. However, when the source data is widely distributed and the computing platform is also distributed, e.g. data is collected in separate data center locations, the most efficient architecture for running Hadoop jobs over the entire data set becomes non-trivial. In this paper, we show the traditional single-cluster MapReduce setup may not be suitable for situations when data and compute resources are widely distributed. Further, we provide recommendations for alternative (and even hierarchical) distributed MapReduce setup configurations, depending on the workload and data set.