autopin: automated optimization of thread-to-core pinning on multicore systems

Authors:
Tobias Klug;Michael Ott;Josef Weidendorfer;Carsten Trinitis
Affiliations:
Technische Universität München, Lehrstuhl für Rechnertechnik und Rechnerorganisation, Parallelrechnerarchitektur, Garching bei München;Technische Universität München, Lehrstuhl für Rechnertechnik und Rechnerorganisation, Parallelrechnerarchitektur, Garching bei München;Technische Universität München, Lehrstuhl für Rechnertechnik und Rechnerorganisation, Parallelrechnerarchitektur, Garching bei München;Technische Universität München, Lehrstuhl für Rechnertechnik und Rechnerorganisation, Parallelrechnerarchitektur, Garching bei München
Venue:
Transactions on high-performance embedded architectures and compilers III
Year:
2011

Citing 7
Cited 5

A scalable cross-platform infrastructure for application performance tuning using hardware counters

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Gprof: A call graph execution profiler

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Automatically Tuned Linear Algebra Software

Automatically Tuned Linear Algebra Software
Large System Performance of SPEC OMP2001 Benchmarks

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
The multicore programming challenge

APPT'07 Proceedings of the 7th international conference on Advanced parallel processing technologies
Latencies of conflicting writes on contemporary multicore architectures

PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies

Cache efficiency and scalability on multi-core architectures

PaCT'11 Proceedings of the 11th international conference on Parallel computing technologies
Evaluation of a high-volume, low-latency market data processing system implemented with IBM middleware

Software—Practice & Experience
Performance patterns and hardware metrics on modern multicore processors: best practices for performance engineering

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Automatic generation of program affinity policies using machine learning

CC'13 Proceedings of the 22nd international conference on Compiler Construction
Dynamic thread pinning for phase-based OpenMP programs

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a framework for automatic detection and application of the best binding between threads of a running parallel application and processor cores in a shared memory system, by making use of hardware performance counters. This is especially important within the scope of multicore architectures with shared cache levels. We demonstrate that many applications from the SPEC OMP benchmark show quite sensitive runtime behavior depending on the thread/core binding used. In our tests, the proposed framework is able to find the best binding in nearly all cases. The proposed framework is intended to supplement job scheduling systems for better automatic exploitation of systems with multicore processors, as well as making programmers aware of this issue by providing measurement logs.