Approximating total flow time on parallel machines
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Minimizing Total Completion Time in a Two-Machine
Mathematics of Operations Research
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Scheduling Strategy to improve Response Time for Web Applications
HPCN Europe 1998 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Size-based scheduling to improve web performance
ACM Transactions on Computer Systems (TOCS)
Classifying scheduling policies with respect to unfairness in an M/GI/1
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Approximability of flow shop scheduling
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
A resource-allocation queueing fairness measure
Proceedings of the joint international conference on Measurement and modeling of computer systems
Simulation Evaluation of Hybrid SRPT Scheduling Policies
MASCOTS '04 Proceedings of the The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Classifying scheduling policies with respect to higher moments of conditional response time
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Revisiting unfairness in web server scheduling
Computer Networks: The International Journal of Computer and Telecommunications Networking
ACM SIGMETRICS Performance Evaluation Review
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Quincy: fair scheduling for distributed computing clusters
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
An Analysis of Traces from a Production MapReduce Cluster
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Reining in the outliers in map-reduce clusters using Mantri
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
On scheduling in map-reduce and flow-shops
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
ARIA: automatic resource inference and allocation for mapreduce environments
Proceedings of the 8th ACM international conference on Autonomic computing
HiTune: dataflow-based performance analysis for big data cloud
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Purlieus: locality-aware resource allocation for MapReduce in a cloud
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Modeling the Performance of the Hadoop Online Prototype
SBAC-PAD '11 Proceedings of the 2011 23rd International Symposium on Computer Architecture and High Performance Computing
Locality-Aware Reduce Task Scheduling for MapReduce
CLOUDCOM '11 Proceedings of the 2011 IEEE Third International Conference on Cloud Computing Technology and Science
PACMan: coordinated memory caching for parallel jobs
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Delay tails in MapReduce scheduling
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Investigation of Data Locality in MapReduce
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
MASCOTS '12 Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
True elasticity in multi-tenant data-intensive compute clusters
Proceedings of the Third ACM Symposium on Cloud Computing
Workload characterization on a production Hadoop cluster: A case study on Taobao
IISWC '12 Proceedings of the 2012 IEEE International Symposium on Workload Characterization (IISWC)
Hi-index | 0.00 |
MapReduce is a scalable parallel computing framework for big data processing. It exhibits multiple processing phases, and thus an efficient job scheduling mechanism is crucial for ensuring efficient resource utilization. There are a variety of scheduling challenges within the MapReduce architecture, and this paper studies the challenges that result from the overlapping of the ''map'' and ''shuffle'' phases. We propose a new, general model for this scheduling problem, and validate this model using cluster experiments. Further, we prove that scheduling to minimize average response time in this model is strongly NP-hard in the offline case and that no online algorithm can be constant-competitive. However, we provide two online algorithms that match the performance of the offline optimal when given a slightly faster service rate (i.e., in the resource augmentation framework). Finally, we validate the algorithms using a workload trace from a Google cluster and show that the algorithms are near optimal in practical settings.