Instruction issue logic for high-performance, interruptable pipelined processors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
The IBM System/370 Vector Architecture: Design Considerations
IEEE Transactions on Computers
Implementing Precise Interrupts in Pipelined Processors
IEEE Transactions on Computers
Two-level adaptive training branch prediction
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Compiler-Based Multiple Instruction Retry
IEEE Transactions on Computers
The impact of architectural trends on operating system performance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Register renaming and dynamic speculation: an alternative approach
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Out-of-order vector architectures
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory exclusion: optimizing the performance of checkpointing systems
Software—Practice & Experience
DIVA: a reliable substrate for deep submicron microarchitecture design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Cost reduction and evaluation of temporary faults detecting technique
DATE '00 Proceedings of the conference on Design, automation and test in Europe
A user-programmable vertex engine
Proceedings of the 28th annual conference on Computer graphics and interactive techniques
Tarantula: a vector extension to the alpha architecture
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dual use of superscalar datapath for transient-fault detection and recovery
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Interrupt Handling for Out-of-Order Execution Processors
IEEE Transactions on Computers
Overcoming the limitations of conventional vector processors
Proceedings of the 30th annual international symposium on Computer architecture
Efficient Exception Handling Techniques for High-Performance Processor Architectures
Efficient Exception Handling Techniques for High-Performance Processor Architectures
Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 43rd annual Design Automation Conference
Implementing virtual memory in a vector processor with software restart markers
Proceedings of the 20th annual international conference on Supercomputing
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
An asymmetric distributed shared memory model for heterogeneous parallel systems
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Dynamic warp subdivision for integrated branch and memory divergence tolerance
Proceedings of the 37th annual international symposium on Computer architecture
Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Computer Architecture, Fifth Edition: A Quantitative Approach
Computer Architecture, Fifth Edition: A Quantitative Approach
Idempotent processor architecture
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Static analysis and compiler design for idempotent processing
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Supporting virtual memory in GPGPU without supporting precise exceptions
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Static analysis and compiler design for idempotent processing
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
RSVM: a region-based software virtual memory for GPU
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Leveraging GPUs using cooperative loop speculation
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Since the introduction of fully programmable vertex shader hardware, GPU computing has made tremendous advances. Exception support and speculative execution are the next steps to expand the scope and improve the usability of GPUs. However, traditional mechanisms to support exceptions and speculative execution are highly intrusive to GPU hardware design. This paper builds on two related insights to provide a unified lightweight mechanism for supporting exceptions and speculation on GPUs. First, we observe that GPU programs can be broken into code regions that contain little or no live register state at their entry point. We then also recognize that it is simple to generate these regions in such a way that they are idempotent, allowing their entry points to function as program recovery points and enabling support for exception handling, fast context switches, and speculation, all with very low overhead. We call the architecture of GPUs executing these idempotent regions the iGPU architecture. The hardware extensions required are minimal and the construction of idempotent code regions is fully transparent under the typical dynamic compilation framework of GPUs. We demonstrate how iGPU exception support enables virtual memory paging with very low overhead (1% to 4%), and how speculation support enables circuit-speculation techniques that can provide over 25% reduction in energy.