iGPU: exception support and speculative execution on GPUs

Authors:
Jaikrishnan Menon;Marc De Kruijf;Karthikeyan Sankaralingam
Affiliations:
University of Wisconsin-Madison;University of Wisconsin-Madison;University of Wisconsin-Madison
Venue:
Proceedings of the 39th Annual International Symposium on Computer Architecture
Year:
2012

Citing 28
Cited 5

Instruction issue logic for high-performance, interruptable pipelined processors

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
The IBM System/370 Vector Architecture: Design Considerations

IEEE Transactions on Computers
Implementing Precise Interrupts in Pipelined Processors

IEEE Transactions on Computers
Two-level adaptive training branch prediction

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Compiler-Based Multiple Instruction Retry

IEEE Transactions on Computers
The impact of architectural trends on operating system performance

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Register renaming and dynamic speculation: an alternative approach

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Out-of-order vector architectures

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory exclusion: optimizing the performance of checkpointing systems

Software—Practice & Experience
DIVA: a reliable substrate for deep submicron microarchitecture design

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Cost reduction and evaluation of temporary faults detecting technique

DATE '00 Proceedings of the conference on Design, automation and test in Europe
A user-programmable vertex engine

Proceedings of the 28th annual conference on Computer graphics and interactive techniques
Tarantula: a vector extension to the alpha architecture

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dual use of superscalar datapath for transient-fault detection and recovery

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
Interrupt Handling for Out-of-Order Execution Processors

IEEE Transactions on Computers
Overcoming the limitations of conventional vector processors

Proceedings of the 30th annual international symposium on Computer architecture
Efficient Exception Handling Techniques for High-Performance Processor Architectures

Efficient Exception Handling Techniques for High-Performance Processor Architectures
Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Rapid and low-cost context-switch through embedded processor customization for real-time and control applications

Proceedings of the 43rd annual Design Automation Conference
Implementing virtual memory in a vector processor with software restart markers

Proceedings of the 20th annual international conference on Supercomputing
A hardware redundancy and recovery mechanism for reliable scientific computation on graphics processors

Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
An asymmetric distributed shared memory model for heterogeneous parallel systems

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Dynamic warp subdivision for integrated branch and memory divergence tolerance

Proceedings of the 37th annual international symposium on Computer architecture
Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Computer Architecture, Fifth Edition: A Quantitative Approach

Computer Architecture, Fifth Edition: A Quantitative Approach
Idempotent processor architecture

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Static analysis and compiler design for idempotent processing

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation

Supporting virtual memory in GPGPU without supporting precise exceptions

Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Static analysis and compiler design for idempotent processing

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
RSVM: a region-based software virtual memory for GPU

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Leveraging GPUs using cooperative loop speculation

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Since the introduction of fully programmable vertex shader hardware, GPU computing has made tremendous advances. Exception support and speculative execution are the next steps to expand the scope and improve the usability of GPUs. However, traditional mechanisms to support exceptions and speculative execution are highly intrusive to GPU hardware design. This paper builds on two related insights to provide a unified lightweight mechanism for supporting exceptions and speculation on GPUs. First, we observe that GPU programs can be broken into code regions that contain little or no live register state at their entry point. We then also recognize that it is simple to generate these regions in such a way that they are idempotent, allowing their entry points to function as program recovery points and enabling support for exception handling, fast context switches, and speculation, all with very low overhead. We call the architecture of GPUs executing these idempotent regions the iGPU architecture. The hardware extensions required are minimal and the construction of idempotent code regions is fully transparent under the typical dynamic compilation framework of GPUs. We demonstrate how iGPU exception support enables virtual memory paging with very low overhead (1% to 4%), and how speculation support enables circuit-speculation techniques that can provide over 25% reduction in energy.