Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
A comparison of sorting algorithms for the connection machine CM-2
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Adaptive parallel aggregation algorithms
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Eddies: continuously adaptive query processing
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Parallel sorting on a shared-nothing architecture using probabilistic splitting
PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Sampling Issues in Parallel Database Systems
EDBT '92 Proceedings of the 3rd International Conference on Extending Database Technology: Advances in Database Technology
Rethinking Database System Architecture: Towards a Self-Tuning RISC-Style Database System
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
LEO - DB2's LEarning Optimizer
Proceedings of the 27th International Conference on Very Large Data Bases
Adaptive self-tuning memory in DB2
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Automatic optimization of parallel dataflow programs
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
Towards optimizing hadoop provisioning in the cloud
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Automated experiment-driven management of (database) systems
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
The performance of MapReduce: an in-depth study
Proceedings of the VLDB Endowment
Towards improved load balancing for data intensive distributed computing
Proceedings of the 2011 ACM Symposium on Applied Computing
Automatic performance debugging of SPMD-style parallel programs
Journal of Parallel and Distributed Computing
A platform for scalable one-pass analytics using MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Exploring MapReduce efficiency with highly-distributed data
Proceedings of the second international workshop on MapReduce and its applications
No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics
Proceedings of the 2nd ACM Symposium on Cloud Computing
Verifiable resource accounting for cloud computing services
Proceedings of the 3rd ACM workshop on Cloud computing security workshop
Purlieus: locality-aware resource allocation for MapReduce in a cloud
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A Load-Driven Task Scheduler with Adaptive DSC for MapReduce
GREENCOM '11 Proceedings of the 2011 IEEE/ACM International Conference on Green Computing and Communications
Parallel data processing with MapReduce: a survey
ACM SIGMOD Record
PerfXplain: debugging MapReduce job performance
Proceedings of the VLDB Endowment
Hirundo: a mechanism for automated production of optimized data stream graphs
ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
An optimization framework for map-reduce queries
Proceedings of the 15th International Conference on Extending Database Technology
Adaptive MapReduce using situation-aware mappers
Proceedings of the 15th International Conference on Extending Database Technology
MapReduce Workload Modeling with Statistical Approach
Journal of Grid Computing
Efficient big data processing in Hadoop MapReduce
Proceedings of the VLDB Endowment
Automatic task slots assignment in Hadoop MapReduce
Proceedings of the 1st Workshop on Architectures and Systems for Big Data
SCALLA: A Platform for Scalable One-Pass Analytics Using MapReduce
ACM Transactions on Database Systems (TODS)
On modelling and prediction of total CPU usage for applications in mapreduce environments
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Cogset: a high performance MapReduce engine
Concurrency and Computation: Practice & Experience
Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling
ACM Transactions on Architecture and Code Optimization (TACO)
ClouDiA: a deployment advisor for public clouds
Proceedings of the VLDB Endowment
Shark: SQL and rich analytics at scale
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
A vision for personalized service level agreements in the cloud
Proceedings of the Second Workshop on Data Analytics in the Cloud
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
Gunther: search-based auto-tuning of mapreduce
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
A framework for an in-depth comparison of scale-up and scale-out
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Hadoop's adolescence: an analysis of Hadoop usage in scientific workloads
Proceedings of the VLDB Endowment
Automatic optimization of stream programs via source program operator graph transformations
Distributed and Parallel Databases
SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters
Journal of Parallel and Distributed Computing
Speeding-up codon analysis on the cloud with local MapReduce aggregation
Information Sciences: an International Journal
Hi-index | 0.01 |
Timely and cost-effective processing of large datasets has become a critical ingredient for the success of many academic, government, and industrial organizations. The combination of MapReduce frameworks and cloud computing is an attractive proposition for these organizations. However, even to run a single program in a MapReduce framework, a number of tuning parameters have to be set by users or system administrators. Users often run into performance problems because they don't know how to set these parameters, or because they don't even know that these parameters exist. With MapReduce being a relatively new technology, it is not easy to find qualified administrators. In this position paper, we make a case for techniques to automate the setting of tuning parameters for MapReduce programs. The objective is to provide good out-of-the-box performance for ad hoc MapReduce programs run on large datasets. This feature can go a long way towards improving the productivity of users who lack the skills to optimize programs themselves due to lack of familiarity with MapReduce or with the data being processed.