Implementing Precise Interrupts in Pipelined Processors

Authors:
James E. Smith;Andrew R. Pleszkun
Affiliations:
Univ. of Wisconsin, Madison;Univ. of Wisconsin, Madison
Venue:
IEEE Transactions on Computers
Year:
1988

Citing 4
Cited 78

The CRAY-1 computer system

Communications of the ACM - Special issue on computer architecture
Hardware/software tradeoffs for increased performance

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
Design of a Computer—The Control Data 6600

Design of a Computer—The Control Data 6600
Planning a computer system: Project Stretch

Planning a computer system: Project Stretch

Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers

IEEE Transactions on Computers
High-Performance Fault-Tolerant VLSI Systems Using Micro Rollback

IEEE Transactions on Computers
The interaction of architecture and operating system design

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
DSNS (dynamically-hazard-resolved statically-code-scheduled, nonuniform superscalar): yet another superscalar processor architecture

ACM SIGARCH Computer Architecture News
The expandable split window paradigm for exploiting fine-grain parallelsim

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Instruction-level parallelism from execution interlock collapsing

ACM SIGARCH Computer Architecture News
An architectural framework for migration from CISC to higher performance platforms

ICS '92 Proceedings of the 6th international conference on Supercomputing
Y-Pipe: a conditional branching scheme without pipeline delays

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Interlock collapsing ALU for increased instruction-level parallelism

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
An out-of-order superscalar processor with speculative execution and fast, precise interrupts

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The MC88110 implementation of precise exceptions in a superscalar architecture

ACM SIGARCH Computer Architecture News
Enhanced superscalar hardware: the schedule table

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
History cache: hardware support for reverse execution

ACM SIGARCH Computer Architecture News
The anatomy of the register file in a multiscalar processor

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Resource allocation in a high clock rate microprocessor

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Exploiting short-lived variables in superscalar processors

Proceedings of the 28th annual international symposium on Microarchitecture
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References

IEEE Transactions on Computers
Dynamically scheduled VLIW processors

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
A comparision of superscalar and decoupled access/execute architectures

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Speculative execution via address prediction and data prefetching

ICS '97 Proceedings of the 11th international conference on Supercomputing
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
Micro-preemption synthesis: an enabling mechanism for multi-task VLSI systems

ICCAD '97 Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design
Selective eager execution on the PolyPath architecture

Proceedings of the 25th annual international symposium on Computer architecture
A look at several memory management units, TLB-refill mechanisms, and page table organizations

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Is SC + ILP = RC?

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
On the scheduling of variable latency functional units

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Code transformations to improve memory parallelism

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Transient fault detection via simultaneous multithreading

Proceedings of the 27th annual international symposium on Computer architecture
Automated pipeline design

Proceedings of the 38th annual Design Automation Conference
The Metaflow Architecture

IEEE Micro
Organization of the Motorola 88110 Superscalar RISC Microprocessor

IEEE Micro
Implementing Precise Interruptions in Pipelined RISC Processors

IEEE Micro
The Design Space of Register Renaming Techniques

IEEE Micro
Hardware/Software Cost Analysis of Interrupt Processing Strategies

IEEE Micro
Interrupt Handling for Out-of-Order Execution Processors

IEEE Transactions on Computers
Compiler-Assisted Multiple Instruction Rollback Recovery Using a Read Buffer

IEEE Transactions on Computers
Error Recovery in Shared Memory Multiprocessors Using Private Caches

IEEE Transactions on Parallel and Distributed Systems
Weld: A Multithreading Technique Towards Latency-Tolerant VLIW Processors

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Speculative Sequential Consistency with Little Custom Storage

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Realizing high IPC through a scalable memory-latency tolerant multipath microarchitecture

ACM SIGARCH Computer Architecture News
Cherry: checkpointed early resource recycling in out-of-order microprocessors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Micronets: a model for decentralising control in asynchronous processor architectures

ASYNC '95 Proceedings of the 2nd Working Conference on Asynchronous Design Methodologies
Memory Faults in Asynchronous Microprocessors

ASYNC '99 Proceedings of the 5th International Symposium on Advanced Research in Asynchronous Circuits and Systems
Implementation Register Interlocks in Parallel-Pipeline, Multiple Instruction Queue, Superscalar Processors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Overcoming the limitations of conventional vector processors

Proceedings of the 30th annual international symposium on Computer architecture
Repairing return address stack for buffer overflow protection

Proceedings of the 1st conference on Computing frontiers
Reducing register pressure through LAER algorithm

ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor

Proceedings of the 31st annual international symposium on Computer architecture
Reducing the Soft-Error Rate of a High-Performance Microprocessor

IEEE Micro
Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Memory State Compressors for Giga-Scale Checkpoint/Restore

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Incremental Commit Groups for Non-Atomic Trace Processing

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Decomposing memory performance: data structures and phases

Proceedings of the 5th international symposium on Memory management
BranchTap: improving performance with very few checkpoints through adaptive speculation control

Proceedings of the 20th annual international conference on Supercomputing
Architecture of a Self-Checkpointing Microprocessor that Incorporates Nanomagnetic Devices

IEEE Transactions on Computers
Error Recovery in Parallel Systems of Pipelined Processors with Caches

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Visual simulator for ILP dynamic OOO processor

WCAE '04 Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture
Building a large instruction window through ROB compression

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Exploiting virtual registers to reduce pressure on real registers

ACM Transactions on Architecture and Code Optimization (TACO)
Hiding the misprediction penalty of a resource-efficient high-performance processor

ACM Transactions on Architecture and Code Optimization (TACO)
Improving single-thread performance with fine-grain state maintenance

Proceedings of the 5th conference on Computing frontiers
Asymmetrically banked value-aware register files for low-energy and high-performance

Microprocessors & Microsystems
Reexecution and Selective Reuse in Checkpoint Processors

Transactions on High-Performance Embedded Architectures and Compilers II
Checkpoint allocation and release

ACM Transactions on Architecture and Code Optimization (TACO)
Formal Verification of Gate-Level Computer Systems

CSR '09 Proceedings of the Fourth International Computer Science Symposium in Russia on Computer Science - Theory and Applications
Architecture Design for Soft Errors

Architecture Design for Soft Errors
Virtual registers: reducing register pressure without enlarging the register file

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Turbo-ROB: a low cost checkpoint/restore accelerator

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Task superscalar: using processors as functional units

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Comparing FPGA vs. custom cmos and the impact on processor microarchitecture

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
CROB: implementing a large instruction window through compression

Transactions on high-performance embedded architectures and compilers III
Idempotent processor architecture

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Static analysis and compiler design for idempotent processing

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
iGPU: exception support and speculative execution on GPUs

Proceedings of the 39th Annual International Symposium on Computer Architecture
Design, implementation, and evaluation of a low-complexity vector-core for executing scalar/vector instructions

Journal of Parallel and Distributed Computing
Using Error Correcting Codes Without Speed Penalty in Embedded Memories: Algorithm, Implementation and Case Study

Journal of Electronic Testing: Theory and Applications

Quantified Score

Hi-index	15.00

Visualization

Abstract

Five solutions to the precise interrupt problem in pipelined processors are described and evaluated. An interrupt is precise if the saved process state corresponds to a sequential model of program execution in which one instruction completes before the next begins. In a pipelined processor, precise interrupts are difficult to implement because an instruction may be initiated before its predecessors have completed. The first solution forces instructions to complete and modify the process state in architectural order. The other four solutions allow instructions to complete in any order, but additional hardware is used, so that a precise state can be restored when an interrupt occurs. All the methods are discussed in the context of a parallel pipeline structure. Simulation results for the Cray-1S scalar architecture are used to show that the first solution results in a performance degradation of at least 16%. The remaining four solutions offer better performance, and three of them result in as little as a 3% performance loss. Several extensions, including vector architectures, virtual memory, and linear pipeline structures, are briefly discussed.