A cost-aware parallel workload allocation approach based on machine learning techniques

Authors:
Shun Long;Grigori Fursin;Björn Franke
Affiliations:
Department of Computer Science, Jinan University, Guangzhou, P.R. China;INRIA Futurs and LRI, Paris-Sud University, France;Institute for Computing Systems Architecture, The University of Edinburgh, UK
Venue:
NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Year:
2007

Citing 17
Cited 3

Speculative multithreaded processors

ICS '98 Proceedings of the 12th international conference on Supercomputing
Automatic loop transformations and parallelization for Java

Proceedings of the 14th international conference on Supercomputing
Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Proceedings of the 27th annual international symposium on Computer architecture
Techniques for speculative run-time parallelization of loops

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Machine Learning

Machine Learning
Online feedback-directed optimization of Java

OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Adaptive Optimizing Compilers for the 21st Century

The Journal of Supercomputing
Automatic Detection of Parallelism: A Grand Challenge for High-Performance Computing

IEEE Parallel & Distributed Technology: Systems & Technology
JESSICA2: A Distributed Java Virtual Machine with Transparent Thread Migration Support

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
The Jrpm system for dynamically parallelizing Java programs

Proceedings of the 30th annual international symposium on Computer architecture
Sourcebook of parallel computing

Sourcebook of parallel computing
Adaptive java optimisation using instance-based learning

Proceedings of the 18th annual international conference on Supercomputing
Conservative Scheduling: Using Predicted Variance to Improve Scheduling Decisions in Dynamic Environments

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Loop Parallelisation for the Jikes RVM

PDCAT '05 Proceedings of the Sixth International Conference on Parallel and Distributed Computing Applications and Technologies
Using Machine Learning to Focus Iterative Optimization

Proceedings of the International Symposium on Code Generation and Optimization
Method-specific dynamic compilation using logistic regression

Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Automatic performance model construction for the fast software exploration of new hardware designs

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems

Mapping parallelism to multi-cores: a machine learning based approach

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A workload-aware mapping approach for data-parallel programs

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
A thread partitioning approach for speculative multithreading

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallelism is one of the main sources for performance improvement in modern computing environment, but the efficient exploitation of the available parallelism depends on a number of parameters. Determining the optimum number of threads for a given data parallel loop, for example, is a difficult problem and dependent on the specific parallel platform. This paper presents a learning-based approach to parallel workload allocation in a cost-aware manner. This approach uses static program features to classify programs, before deciding the best workload allocation scheme based on its prior experience with similar programs. Experimental results on 12 Java benchmarks (76 test cases with different workloads in total) show that it can efficiently allocate the parallel workload among Java threads and achieve an efficiency of 86% on average.