Multilevel k-way partitioning scheme for irregular graphs
Journal of Parallel and Distributed Computing
Design and Prototype of a Performance Tool Interface for OpenMP
The Journal of Supercomputing
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Feedback-directed thread scheduling with memory considerations
Proceedings of the 16th international symposium on High performance distributed computing
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Data and thread affinity in openmp programs
Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Cache topology aware computation mapping for multicores
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications
Proceedings of the 37th annual international symposium on Computer architecture
autopin: automated optimization of thread-to-core pinning on multicore systems
Transactions on high-performance embedded architectures and compilers III
Hi-index | 0.00 |
Thread affinity has appeared as an important technique to improve the overall program performance and for better performance stability. However, if we consider a program with multiple phases, it is unlikely that a single thread affinity produces the best program performance for all these phases. If we consider the case of OpenMP, applications may have multiple parallel regions, each with a distinct inter-thread data sharing pattern. In this paper, we propose an approach that allows to change thread affinity dynamically (thread migrations) between parallel regions at runtime to account for these distinct inter-thread data sharing patterns. We demonstrate that as far as cache sharing is concerned for SPEC OMP01, not all the tested OpenMP applications exhibit a distinct phase behavior. However, we show that while fixing thread affinity for the whole execution may improve performance by up to 30%, allowing dynamic thread pinning may improve performance by up to 40%. Furthermore, we provide an analysis about the required conditions to improve the effectiveness of the approach.