Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Towards automatic optimization of MapReduce programs
Proceedings of the 1st ACM symposium on Cloud computing
Topology-aware resource allocation for data-intensive workloads
Proceedings of the first ACM asia-pacific workshop on Workshop on systems
Topology-aware resource allocation for data-intensive workloads
ACM SIGCOMM Computer Communication Review
Towards improved load balancing for data intensive distributed computing
Proceedings of the 2011 ACM Symposium on Applied Computing
A hadoop-based packet trace processing tool
TMA'11 Proceedings of the Third international conference on Traffic monitoring and analysis
ARIA: automatic resource inference and allocation for mapreduce environments
Proceedings of the 8th ACM international conference on Autonomic computing
No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics
Proceedings of the 2nd ACM Symposium on Cloud Computing
Purlieus: locality-aware resource allocation for MapReduce in a cloud
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A Load-Driven Task Scheduler with Adaptive DSC for MapReduce
GREENCOM '11 Proceedings of the 2011 IEEE/ACM International Conference on Green Computing and Communications
MATE-EC2: a middleware for processing data with AWS
Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
Resource provisioning framework for mapreduce jobs with performance goals
Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
Panacea: towards holistic optimization of MapReduce applications
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Time and Cost Sensitive Data-Intensive Computing on Hybrid Clouds
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Automated profiling and resource management of pig programs for meeting service level objectives
Proceedings of the 9th international conference on Autonomic computing
AROMA: automated resource allocation and configuration of mapreduce environment in the cloud
Proceedings of the 9th international conference on Autonomic computing
Automatic task slots assignment in Hadoop MapReduce
Proceedings of the 1st Workshop on Architectures and Systems for Big Data
Bridging the tenant-provider gap in cloud services
Proceedings of the Third ACM Symposium on Cloud Computing
On modelling and prediction of total CPU usage for applications in mapreduce environments
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Resource provisioning framework for MapReduce jobs with performance goals
Proceedings of the 12th International Middleware Conference
Cumulon: optimizing statistical data analysis in the cloud
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Modeling I/O interference for data intensive distributed applications
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Building an on-demand virtual computing market in non-commercial communities
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Mammoth: autonomic data processing framework for scientific state-transition applications
Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
Performance Modeling and Optimization of Deadline-Driven Pig Programs
ACM Transactions on Autonomous and Adaptive Systems (TAAS)
A data-centric heuristic for Hadoop provisioning in the cloud
Proceedings of the 6th ACM India Computing Convention
Proceedings of the 4th annual Symposium on Cloud Computing
Gunther: search-based auto-tuning of mapreduce
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 0.01 |
Data analytics is becoming increasingly prominent in a variety of application areas ranging from extracting business intelligence to processing data from scientific studies. MapReduce programming paradigm lends itself well to these data-intensive analytics jobs, given its ability to scale-out and leverage several machines to parallely process data. In this work we argue that such MapReduce-based analytics are particularly synergistic with the pay-as-you-go model of a cloud platform. However, a key challenge facing end-users in this environment is the ability to provision MapReduce applications to minimize the incurred cost, while obtaining the best performance. This paper firstmotivates the importance of optimally provisioning a MapReduce job, and demonstrates that existing approaches can result in far from optimal provisioning. We then present a preliminary approach that improves MapReduce provisioning by analyzing and comparing resource consumption of the application at hand with a database of similar resource consumption signatures of other applications.