C4.5: programs for machine learning
C4.5: programs for machine learning
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers
IEEE Transactions on Parallel and Distributed Systems
Reducing Parallel Overheads Through Dynamic Serialization
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors
Is the schedule clause really necessary in OpenMP?
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Online power-performance adaptation of multithreaded programs using hardware event-based prediction
Proceedings of the 20th annual international conference on Supercomputing
Capturing performance knowledge for automated analysis
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Mapping parallelism to multi-cores: a machine learning based approach
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Online strategies for high-performance power-aware thread execution on emerging multiprocessors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
Hyperthreaded (HT) and simultaneous multithreaded (SMT) processors are now available in commodity workstations and servers. This technology is designed to increase throughput by executing multiple concurrent threads on a single physical processor. These multiple threads share the processor's functional units and on-chip memory hierarchy in an attempt to make better use of idle resources. Most OpenMP applications have been written assuming an Symmetric Multiprocessor (SMP), not an SMT, model. Threads executing on the same physical processor have interactions on data locality and resource sharing that do not occur on traditional SMPs. This work focuses on tuning the behavior of OpenMP applications executing on SMPs with SMT processors. We propose two adaptive loop schedulers that determine effective hierarchical schedulers for individual parallel loops. We compare the performance of our two proposed schedulers against several standard schedulers and the per-region adaptive scheduler proposed by Zhang et al. using the SPEC and NAS OpenMP benchmark suites. We show that both of our proposed schedulers outperform all other schedulers on average, and increase speedup on average by over 25% when all thread contexts are used.