Modeling the relative fitness of storage
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Pfp: parallel fp-growth for query recommendation
Proceedings of the 2008 ACM conference on Recommender systems
Understanding Performance Interference of I/O Workload in Virtualized Cloud Environments
CLOUD '10 Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing
To compress or not to compress - compute vs. IO tradeoffs for mapreduce energy efficiency
Proceedings of the first ACM SIGCOMM workshop on Green networking
Towards optimizing hadoop provisioning in the cloud
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Automatic optimization for MapReduce programs
Proceedings of the VLDB Endowment
ARIA: automatic resource inference and allocation for mapreduce environments
Proceedings of the 8th ACM international conference on Autonomic computing
Proceedings of the 2nd ACM Symposium on Cloud Computing
CoScan: cooperative scan sharing in the cloud
Proceedings of the 2nd ACM Symposium on Cloud Computing
Query optimization for massively parallel data processing
Proceedings of the 2nd ACM Symposium on Cloud Computing
No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics
Proceedings of the 2nd ACM Symposium on Cloud Computing
Pesto: online storage performance management in virtualized datacenters
Proceedings of the 2nd ACM Symposium on Cloud Computing
Trojan data layouts: right shoes for a running elephant
Proceedings of the 2nd ACM Symposium on Cloud Computing
TRACON: interference-aware scheduling for data-intensive applications in virtualized environments
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
MapReduce Workload Modeling with Statistical Approach
Journal of Grid Computing
Hi-index | 0.00 |
Data intensive applications such as MapReduce can have large performance degradation from the effects of I/O interference when multiple processes access the same I/O resources simultaneously, particularly in the case of disks. It is necessary to understand this effect in order to improve resource allocation and utilization for these applications. In this paper, we propose a model for predicting the impact of I/O interference on MapReduce application performance. Our model takes basic parameters of the workload and hardware environment, and knowledge of the I/O behavior of the application to predict how I/O interference affects the scalability of an application. We compare the model's predictions for several workloads (TeraSort, WordCount, PFP Growth and PageRank) against the actual behavior of those workloads in a real cluster environment, and confirm that our model can provide highly accurate predictions.