Paragon: collaborative speculative loop execution on GPU and CPU

Authors:
Mehrzad Samadi;Amir Hormati;Janghaeng Lee;Scott Mahlke
Affiliations:
University of Michigan - Ann Arbor, MI;Microsoft Research, Microsoft, inc. - Redmond, WA;University of Michigan - Ann Arbor, MI;University of Michigan - Ann Arbor, MI
Venue:
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Year:
2012

Citing 31
Cited 0

Hardware and software support for speculative execution of sequential binaries on a chip-multiprocessor

ICS '98 Proceedings of the 12th international conference on Supercomputing
Evaluating Automatic Parallelization in SUIF

IEEE Transactions on Parallel and Distributed Systems
Modular interprocedural pointer analysis using access paths: design, implementation, and evaluation

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
A general compiler framework for speculative multithreading

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Compiler optimization of scalar value communication between speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Parallel Programming with Polaris

Computer
The Direction Vector I Test

IEEE Transactions on Parallel and Distributed Systems
Master/slave speculative parallelization

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Compiler support for speculative multithreading architecture with probabilistic points-to analysis

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Speedup of ordinary programs

Speedup of ordinary programs
A cost-driven compilation framework for speculative parallelization of sequential programs

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Decoupled Software Pipelining with the Synchronization Array

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Exposing speculative thread parallelism in SPEC2000

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
The STAMPede approach to thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Automatic Thread Extraction with Decoupled Software Pipelining

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
POSH: a TLS compiler that exploits program structure

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Accelerator: using data parallelism to program GPUs for general-purpose uses

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Speculative thread decomposition through empirical optimization

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Speculative Decoupled Software Pipelining

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Commutativity analysis for software parallelization: letting program transformations see the big picture

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Copy or Discard execution model for speculative parallelization on multicores

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Automatic parallelization for graphics processing units

PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Implementing the PGI Accelerator model

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture
hiCUDA: High-Level GPGPU Programming

IEEE Transactions on Parallel and Distributed Systems
Dynamic parallelization of JavaScript applications using an ultra-lightweight speculation mechanism

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Hardware transactional memory for GPU architectures

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Automatic C-to-CUDA code generation for affine programs

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Towards a software transactional memory for graphics processors

EG PGV'10 Proceedings of the 10th Eurographics conference on Parallel Graphics and Visualization

Quantified Score

Hi-index	0.00

Visualization

Abstract

The rise of graphics engines as one of the main parallel platforms for general purpose computing has ignited a wide search for better programming support for GPUs. Due to their non-traditional execution model, developing applications for GPUs is usually very challenging, and as a result, these devices are left under-utilized in many commodity systems. Several languages, such as CUDA, have emerged to solve this challenge, but past research has shown that developing applications in these languages is a daunting task because of the tedious performance optimization cycle or inherent algorithmic characteristics of an application, which could make it unsuitable for GPUs. Also, previous approaches of automatically generating optimized parallel code in CUDA for GPUs using complex compilation techniques have failed to utilize GPUs that are present in everyday computing devices such as laptops and mobile systems. In this work, we take a different approach. Although it is hard to generate optimized code for GPU, it is beneficial to utilize them speculatively rather than leaving them running idle due to their high raw performance capabilities compared to CPUs. To achieve this goal, we propose Paragon: a collaborative static/dynamic compiler platform to speculatively run possibly-data-parallel pieces of sequential applications on GPUs. Paragon utilizes the GPU in an opportunistic way for loops that are categorized as possibly-data-parallel by its loop classification phase. While running the loop speculatively, Paragon monitors the dependencies using a light-weight kernel management unit, and transfers the execution to the CPU in case a conflict is detected. Paragon resumes the execution on the GPU after the dependency is executed sequentially on the CPU. Our experiments show that Paragon achieves up to 12x speedup compared to unsafe CPU execution with 4 threads.