Automatic parallelization for graphics processing units

Authors:
Alan Leung;Ondřej Lhoták;Ghulam Lashari
Affiliations:
University of Waterloo, Waterloo, ON, Canada;University of Waterloo, Waterloo, ON, Canada;University of Waterloo, Waterloo, ON, Canada
Venue:
PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Year:
2009

Citing 28
Cited 8

Supercompilers for parallel and vector computers

Supercompilers for parallel and vector computers
The structure of parafrase-2: an advanced parallelizing compiler for C and FORTRAN

Selected papers of the second workshop on Languages and compilers for parallel computing
Simple vector microprocessors for multimedia applications

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Automatic loop transformations and parallelization for Java

Proceedings of the 14th international conference on Supercomputing
Exploiting superword level parallelism with multimedia instruction sets

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Compilation techniques for multimedia processors

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, Part 1
A vectorizing compiler for multimedia extensions

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, Part 1
Adaptive optimization in the Jalapeño JVM

OOPSLA '00 Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A comparison of three approaches to language, compiler, and library support for multidimensional arrays in Java

Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Automatic intra-register vectorization for the Intel architecture

International Journal of Parallel Programming
Automatic Parallelization for Non-cache Coherent Multiprocessors

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Polaris: Improving the Effectiveness of Parallelizing Compilers

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
JavaSpMT: A Speculative Thread Pipelining Parallelization Model for Java Programs

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Vectorizing for a SIMdD DSP architecture

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance

Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
The Jalapeño virtual machine

IBM Systems Journal
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
Optimizing Compiler for the CELL Processor

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
SableSpMT: a software framework for analysing speculative multithreading in Java

PASTE '05 Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Loop Parallelisation for the Jikes RVM

PDCAT '05 Proceedings of the Sixth International Conference on Parallel and Distributed Computing Applications and Technologies
Accelerator: using data parallelism to program GPUs for general-purpose uses

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
General-purpose GPU computing: practice and experience

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Introducing Control Flow into Vectorized Code

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Automatically translating a general purpose C++ image processing library for GPUs

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Optimizing chip multiprocessor work distribution using dynamic compilation

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Multi-GPU and multi-CPU parallelization for interactive physics simulations

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
The architecture of the DecentVM: towards a decentralized virtual machine for many-core computing

Virtual Machines and Intermediate Languages
Mathematical morphology in computer graphics, scientific visualization and visual exploration

ISMM'11 Proceedings of the 10th international conference on Mathematical morphology and its applications to image and signal processing
Paragon: collaborative speculative loop execution on GPU and CPU

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Adaptive input-aware compilation for graphics engines

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
KFusion: optimizing data flow without compromising modularity

Proceedings of the 12th annual international conference on Aspect-oriented software development
Parallel execution of Java loops on Graphics Processing Units

Science of Computer Programming
Leveraging GPUs using cooperative loop speculation

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Accelerated graphics cards, or Graphics Processing Units (GPUs), have become ubiquitous in recent years. On the right kinds of problems, GPUs greatly surpass CPUs in terms of raw performance. However, because they are difficult to program, GPUs are used only for a narrow class of special-purpose applications; the raw processing power made available by GPUs is unused most of the time. This paper presents an extension to a Java JIT compiler that executes suitable code on the GPU instead of the CPU. Both static and dynamic features are used to decide whether it is feasible and beneficial to off-load a piece of code on the GPU. The paper presents a cost model that balances the speedup available from the GPU against the cost of transferring input and output data between main memory and GPU memory. The cost model is parameterized so that it can be applied to different hardware combinations. The paper also presents ways to overcome several obstacles to parallelization inherent in the design of the Java bytecode language: unstructured control flow, the lack of multi-dimensional arrays, the precise exception semantics, and the proliferation of indirect references.