Model-driven mapping onto distributed memory parallel computers
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Using parallel program characteristics in dynamic processor allocation policies
Performance Evaluation
Learning to schedule straight-line code
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Generalized multiprocessor scheduling for directed acylic graphs
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
A Machine Learning Approach to Automatic Production of Compiler Heuristics
AIMSA '02 Proceedings of the 10th International Conference on Artificial Intelligence: Methodology, Systems, and Applications
SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance
WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
Predicting Unroll Factors Using Supervised Classification
Proceedings of the international symposium on Code generation and optimization
Adaptive execution techniques for SMT multiprocessor architectures
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Using Machine Learning to Focus Iterative Optimization
Proceedings of the International Symposium on Code Generation and Optimization
Online power-performance adaptation of multithreaded programs using hardware event-based prediction
Proceedings of the 20th annual international conference on Supercomputing
A regression-based approach to scalability prediction
Proceedings of the 22nd annual international conference on Supercomputing
Proceedings of the conference on Design, automation and test in Europe
Mapping parallelism to multi-cores: a machine learning based approach
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Mapping stream programs onto heterogeneous multiprocessor systems
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Thread to Core Assignment in SMT On-Chip Multiprocessors
SBAC-PAD '09 Proceedings of the 2009 21st International Symposium on Computer Architecture and High Performance Computing
Probabilistic job symbiosis modeling for SMT processor scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Compiler techniques for reducing data cache miss rate on a multithreaded architecture
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
A cost-aware parallel workload allocation approach based on machine learning techniques
NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
An approach to performance prediction for parallel applications
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Combining locality analysis with online proactive job co-scheduling in chip multiprocessors
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Mastering software variant explosion for GPU accelerators
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Using machine learning to partition streaming programs
ACM Transactions on Architecture and Code Optimization (TACO)
Integrating profile-driven parallelism detection and machine-learning-based mapping
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Much compiler-orientated work in the area of mapping parallel programs to parallel architectures has ignored the issue of external workload. Given that the majority of platforms will not be dedicated to just one task at a time, the impact of other jobs needs to be addressed. As mapping is highly dependent on the underlying machine, a technique that is easily portable across platforms is also desirable. In this paper we develop an approach for predicting the optimal number of threads for a given data-parallel application in the presence of external workload. We achieve 93.7% of the maximum speedup available which gives an average speedup of 1.66 on 4 cores, a factor 1.24 times better than the OpenMP compiler's default policy. We also develop an alternative cooperative model that minimizes the impact on external workload while still giving an improved average speedup. Finally, we evaluate our approach on a separate 8-core machine giving an average 1.33 times speedup over the default policy showing the portability of our approach.