Run-time disambiguation: coping with statically unpredictable dependencies
IEEE Transactions on Computers
A scalable approach to thread-level speculation
Proceedings of the 27th annual international symposium on Computer architecture
IEEE Micro
Programmer specified pointer independence
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Subroutine profiling results for the CPU2006 benchmarks
ACM SIGARCH Computer Architecture News
Sensitivity analysis for automatic parallelization on multi-cores
Proceedings of the 21st annual international conference on Supercomputing
A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Compiler-Driven Dependence Profiling to Guide Program Parallelization
Languages and Compilers for Parallel Computing
Stretching transactional memory
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
The Paralax infrastructure: automatic parallelization with a helping hand
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
DISC'06 Proceedings of the 20th international conference on Distributed Computing
Hi-index | 0.01 |
Parallel software is now required to exploit the abundance of threads and processors in modern multicore computers. Unfortunately, manual parallelization of software is too time-consuming and error-prone for all but the most advanced programmers. While automatic parallelization promises threaded software with little programmer effort, current auto-parallelizers can be easily thwarted by pointers, complex control flow, and other forms of ambiguity in the code. In this paper we explore in detail the loops in SPEC CPU2006 applications, categorize the loops in terms of available parallelism, and focus on promising loops that are not parallelized by IBM's XL C/C++ V10 auto-parallelization facility. For those loops we propose methods of improved interaction between the programmer and compiler that can facilitate their parallelization. In particular, we (i) suggest methods for the compiler to better identify to the programmer the parallelization-blockers that it finds; (ii) suggest methods for the programmer to provide guarantees to the compiler that overcome these parallelization-blockers; and (iii) evaluate the resulting impact on performance.