Slice-processors: an implementation of operation-based prediction

Authors:
Andreas Moshovos;Dionisios N. Pnevmatikatos;Amirali Baniasadi
Affiliations:
Electrical and Computer Engineering, University of Toronto;Electronic and Computer Engineering, Technical University of Crete;Electrical and Computer Engineering, Northwestern University
Venue:
ICS '01 Proceedings of the 15th international conference on Supercomputing
Year:
2001

Citing 15
Cited 35

Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Examination of a memory access classification scheme for pointer-intensive and numeric programs

ICS '96 Proceedings of the 10th international conference on Supercomputing
Memory-system design considerations for dynamically-scheduled processors

Proceedings of the 24th annual international symposium on Computer architecture
Dataflow analysis of branch mispredictions and its application to early resolution of branch outcomes

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Dependence based prefetching for linked data structures

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Simultaneous subordinate microthreading (SSMT)

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Improving virtual function call target prediction via dependence-based pre-computation

ICS '99 Proceedings of the 13th international conference on Supercomputing
Understanding the backward slices of performance degrading instructions

Proceedings of the 27th annual international symposium on Computer architecture
Instruction path coprocessors

Proceedings of the 27th annual international symposium on Computer architecture
Execution-based prediction using speculative slices

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Slipstream processors: improving both performance and fault tolerance

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Decoupled access/execute computer architectures

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
The Alpha 21264 Microprocessor Architecture

ICCD '98 Proceedings of the International Conference on Computer Design
Speculative Data-Driven Multithreading

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Memory dependence prediction

Memory dependence prediction

Post-pass binary adaptation for software-based speculative precomputation

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Dynamic speculative precomputation

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A Decoupled Predictor-Directed Stream Prefetching Architecture

IEEE Transactions on Computers
A Programmable Memory Hierarchy for Prefetching Linked Data Structures

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Pointer cache assisted prefetching

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Microarchitectural support for precomputation microthreads

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A quantitative framework for automated pre-execution thread selection

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A framework for modeling and optimization of prescient instruction prefetch

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A Simple Mechanism for Detecting Ineffectual Instructions in Slipstream Processors

IEEE Transactions on Computers
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A general framework for prefetch scheduling in linked data structures and its application to multi-chain prefetching

ACM Transactions on Computer Systems (TOCS)
Data forwarding through in-memory precomputation threads

Proceedings of the 18th annual international conference on Supercomputing
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism

Proceedings of the 31st annual international symposium on Computer architecture
A study of source-level compiler algorithms for automatic construction of pre-execution code

ACM Transactions on Computer Systems (TOCS)
Helper threads via virtual multithreading on an experimental itanium® 2 processor-based platform

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Energy-Effectiveness of Pre-Execution and Energy-Aware P-Thread Selection

Proceedings of the 32nd annual international symposium on Computer Architecture
Design and Implementation of a Compiler Framework for Helper Threading on Multi-core Processors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A Self-Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework

Proceedings of the International Symposium on Code Generation and Optimization
Efficient emulation of hardware prefetchers via event-driven helper threading

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Speculative pre-execution assisted by compiler (SPEAR)

Journal of Parallel and Distributed Computing - Special issue on parallel bioinspired algorithms
Block-aware instruction set architecture

ACM Transactions on Architecture and Code Optimization (TACO)
Design and evaluation of a hierarchical decoupled architecture

The Journal of Supercomputing
SlicK: slice-based locality exploitation for efficient redundant multithreading

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Future execution: A prefetching mechanism that uses multiple cores to speed up single threads

ACM Transactions on Architecture and Code Optimization (TACO)
Accelerating sequential programs on Chip Multiprocessors via Dynamic Prefetching Thread

Microprocessors & Microsystems
Ubiquitous memory introspection

Proceedings of the International Symposium on Code Generation and Optimization
Optimization of data prefetch helper threads with path-expression based statistical modeling

Proceedings of the 21st annual international conference on Supercomputing
A low-complexity microprocessor design with speculative pre-execution

Journal of Systems Architecture: the EUROMICRO Journal
A performance-correctness explicitly-decoupled architecture

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
OUTRIDER: efficient memory latency tolerance with decoupled strands

Proceedings of the 38th annual international symposium on Computer architecture
Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
A hybrid hardware/software generated prefetching thread mechanism on chip multiprocessors

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

We describe the Slice Processor micro-architecture that implements a generalized operation-based prefetching mechanism. Operation-based prefetchers predict the series of operations, or the computation slice that can be used to calculate forthcoming memory references. This is in contrast to outcome-based predictors that exploit regularities in the (address) outcome stream. Slice processors are a generalization of existing operation-based prefetching mechanisms such as stream buffers where the operation itself is fixed in the design (e.g., address + stride). A slice processor dynamically identifies frequently missing loads and extracts on-the-fly the relevant address computation slices. Such slices are then executed in-parallel with the main sequential thread prefetching memory data. We describe the various support structures and emphasize the design of the slice detection mechanism. We demonstrate that a relatively simple organization can significantly improve performance over an aggressive, dynamically-scheduled processor and for a set of pointer-intensive programs and for some integer applications from the SPEC'95 suite. In particular, a slice processor that can detect slices of up to 8 instructions extracted over of a region of up to 32 instructions improves performance by 11% on the average (even if slice detection requires up to 32 cycles). Allowing slices of up to 16 instructions results in an average performance improvement of 15%. Finally, we study how our operation-based predictor interacts with an outcome-based one and find them mutually beneficial.