Runtime Empirical Selection of Loop Schedulers on Hyperthreaded SMPs

Authors:
Yun Zhang;Michael Voss
Affiliations:
University of Toronto, Canada;University of Toronto, Canada
Venue:
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Year:
2005

Citing 6
Cited 5

C4.5: programs for machine learning

C4.5: programs for machine learning
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers

IEEE Transactions on Parallel and Distributed Systems
Reducing Parallel Overheads Through Dynamic Serialization

IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors

Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors
Is the schedule clause really necessary in OpenMP?

WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming

Online power-performance adaptation of multithreaded programs using hardware event-based prediction

Proceedings of the 20th annual international conference on Supercomputing
Capturing performance knowledge for automated analysis

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Mapping parallelism to multi-cores: a machine learning based approach

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Online strategies for high-performance power-aware thread execution on emerging multiprocessors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hyperthreaded (HT) and simultaneous multithreaded (SMT) processors are now available in commodity workstations and servers. This technology is designed to increase throughput by executing multiple concurrent threads on a single physical processor. These multiple threads share the processor's functional units and on-chip memory hierarchy in an attempt to make better use of idle resources. Most OpenMP applications have been written assuming an Symmetric Multiprocessor (SMP), not an SMT, model. Threads executing on the same physical processor have interactions on data locality and resource sharing that do not occur on traditional SMPs. This work focuses on tuning the behavior of OpenMP applications executing on SMPs with SMT processors. We propose two adaptive loop schedulers that determine effective hierarchical schedulers for individual parallel loops. We compare the performance of our two proposed schedulers against several standard schedulers and the per-region adaptive scheduler proposed by Zhang et al. using the SPEC and NAS OpenMP benchmark suites. We show that both of our proposed schedulers outperform all other schedulers on average, and increase speedup on average by over 25% when all thread contexts are used.