Programmer-assisted automatic parallelization

Authors:
Diego Huang;J. Gregory Steffan
Affiliations:
University of Toronto;University of Toronto
Venue:
Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research
Year:
2011

Citing 12
Cited 0

Run-time disambiguation: coping with statically unpredictable dependencies

IEEE Transactions on Computers
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
The Stanford Hydra CMP

IEEE Micro
Programmer specified pointer independence

MSP '04 Proceedings of the 2004 workshop on Memory system performance
Subroutine profiling results for the CPU2006 benchmarks

ACM SIGARCH Computer Architecture News
Sensitivity analysis for automatic parallelization on multi-cores

Proceedings of the 21st annual international conference on Supercomputing
A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Compiler-Driven Dependence Profiling to Guide Program Parallelization

Languages and Compilers for Parallel Computing
Stretching transactional memory

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
The Paralax infrastructure: automatic parallelization with a helping hand

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Transactional locking II

DISC'06 Proceedings of the 20th international conference on Distributed Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Parallel software is now required to exploit the abundance of threads and processors in modern multicore computers. Unfortunately, manual parallelization of software is too time-consuming and error-prone for all but the most advanced programmers. While automatic parallelization promises threaded software with little programmer effort, current auto-parallelizers can be easily thwarted by pointers, complex control flow, and other forms of ambiguity in the code. In this paper we explore in detail the loops in SPEC CPU2006 applications, categorize the loops in terms of available parallelism, and focus on promising loops that are not parallelized by IBM's XL C/C++ V10 auto-parallelization facility. For those loops we propose methods of improved interaction between the programmer and compiler that can facilitate their parallelization. In particular, we (i) suggest methods for the compiler to better identify to the programmer the parallelization-blockers that it finds; (ii) suggest methods for the programmer to provide guarantees to the compiler that overcome these parallelization-blockers; and (iii) evaluate the resulting impact on performance.