A Simple Mechanism for Detecting Ineffectual Instructions in Slipstream Processors

Authors:
Jinson J. Koppanalil;Eric Rotenberg
Affiliations:
-;-
Venue:
IEEE Transactions on Computers
Year:
2004

Citing 13
Cited 3

The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Assigning confidence to conditional branch predictions

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Understanding the backward slices of performance degrading instructions

Proceedings of the 27th annual international symposium on Computer architecture
On the value locality of store instructions

Proceedings of the 27th annual international symposium on Computer architecture
A study of slipstream processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Slice-processors: an implementation of operation-based prediction

ICS '01 Proceedings of the 15th international conference on Supercomputing
Execution-based prediction using speculative slices

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Slipstream processors: improving both performance and fault tolerance

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Data prefetching by dependence graph precomputation

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Dynamic speculative precomputation

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Speculative Data-Driven Multithreading

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Trace processors: exploiting hierarchy and speculation

Trace processors: exploiting hierarchy and speculation

Understanding prediction-based partial redundant threading for low-overhead, high- coverage fault tolerance

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
SlicK: slice-based locality exploitation for efficient redundant multithreading

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Lazy instruction scheduling: keeping performance, reducing power

Proceedings of the 13th international symposium on Low power electronics and design

Quantified Score

Hi-index	14.98

Visualization

Abstract

A slipstream processor accelerates a program by speculatively removing repeatedly ineffectual instructions. Detecting the roots of ineffectual computation驴unreferenced writes, nonmodifying writes, and correctly predicted branches驴is straightforward. On the other hand, detecting ineffectual instructions in the backward slices of these root instructions currently requires complex back-propagation circuitry. We observe that, by logically monitoring the speculative program (instead of the original program), back-propagation can be reduced to detecting unreferenced writes. That is, once root instructions are actually removed, instructions at the next higher level in the backward slice become newly exposed unreferenced writes in the speculative program. This new algorithm, called implicit back-propagation, eliminates complex hardware and achieves an average performance improvement of 11.8 percent, only marginally lower than the 12.3 percent improvement achieved with explicit back-propagation. We further simplify the hardware component by electing not to detect ineffectual memory writes, focusing only on ineffectual register writes. A minimal implementation consisting of only a register-indexed table (similar to an architectural register file) achieves a good balance between complexity and performance (11.2 percent average performance improvement with implicit back-propagation and without detection of ineffectual memory writes).