The grid: blueprint for a new computing infrastructure
The grid: blueprint for a new computing infrastructure
Simgrid: A Toolkit for the Simulation of Application Scheduling
CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
Performance Evaluation Model for Scheduling in Global Computing Systems
International Journal of High Performance Computing Applications
Experiences with MapReduce, an abstraction for large-scale computation
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
The MicroGrid: A scientific tool for modeling Computational Grids
Scientific Programming
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Measurement and analysis of TCP throughput collapse in cluster-based storage systems
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
X-trace: a pervasive network tracing framework
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Delay tails in MapReduce scheduling
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
MapReduce Workload Modeling with Statistical Approach
Journal of Grid Computing
Systematic approach of using power save mode for cloud data processing services
International Journal of Ad Hoc and Ubiquitous Computing
HSim: A MapReduce simulator in enabling Cloud Computing
Future Generation Computer Systems
On modelling and prediction of total CPU usage for applications in mapreduce environments
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
MRSG - A MapReduce simulator over SimGrid
Parallel Computing
Hi-index | 0.00 |
Recently, there has been a huge growth in the amount of data processed by enterprises and the scientific computing community. Two promising trends ensure that applications will be able to deal with ever increasing data volumes: First, the emergence of cloud computing, which provides transparent access to a large number of compute, storage and networking resources; and second, the development of the MapReduce programming model, which provides a high-level abstraction for data-intensive computing. However, the design space of these systems has not been explored in detail. Specifically, the impact of various design choices and run-time parameters of a MapReduce system on application performance remains an open question. To this end, we embarked on systematically understanding the performance of MapReduce systems, but soon realized that understanding effects of parameter tweaking in a large-scale setup with many variables was impractical. Consequently, in this paper, we present the design of an accurate MapReduce simulator, MRPerf, for facilitating exploration of MapReduce design space. MRPerf captures various aspects of a MapReduce setup, and uses this information to predict expected application performance. In essence, MRPerf can serve as a design tool for MapReduce infrastructure, and as a planning tool for making MapReduce deployment far easier via reduction in the number of parameters that currently have to be hand-tuned using rules of thumb. Our validation of MRPerf using data from medium-scale production clusters shows that it is able to predict application performance accurately, and thus can be a useful tool in enabling cloud computing. Moreover, an initial application of MRPerf to our test clusters running Hadoop, revealed a performance bottleneck, fixing which resulted in up to 28.05% performance improvement.