Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor

Authors:
Shailender Chaudhry;Robert Cypher;Magnus Ekman;Martin Karlsson;Anders Landin;Sherman Yip;Håkan Zeffer;Marc Tremblay
Affiliations:
Sun Microsystems, Inc., Santa Clara, CA, USA;Sun Microsystems, Inc., Santa Clara, CA, USA;Sun Microsystems, Inc., Santa Clara, CA, USA;Sun Microsystems, Inc., Santa Clara, CA, USA;Sun Microsystems, Inc., Santa Clara, CA, USA;Sun Microsystems, Inc., Santa Clara, CA, USA;Sun Microsystems, Inc., Santa Clara, CA, USA;Sun Microsystems, Inc., Santa Clara, CA, USA
Venue:
Proceedings of the 36th annual international symposium on Computer architecture
Year:
2009

Citing 23
Cited 16

Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Dynamic memory disambiguation using the memory conflict buffer

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Improving data cache performance by pre-executing instructions under a cache miss

ICS '97 Proceedings of the 11th international conference on Supercomputing
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The Alpha 21264 Microprocessor

IEEE Micro
The MAJC Architecture: A Synthesis of Parallelism and Scalability

IEEE Micro
Cherry: checkpointed early resource recycling in out-of-order microprocessors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Beating in-order stalls with "flea-flicker" two-pass pipelining

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Continual flow pipelines

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
High-Performance Throughput Computing

IEEE Micro
Kilo-Instruction Processors: Overcoming the Memory Wall

IEEE Micro
Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance

IEEE Micro
Rock: A High-Performance Sparc CMT Processor

IEEE Micro

BulkCompiler: high-performance sequential consistency through cooperative compiler and hardware support

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Early experience with a commercial hardware transactional memory implementation

Early experience with a commercial hardware transactional memory implementation
RETCON: transactional repair without replay

Proceedings of the 37th annual international symposium on Computer architecture
Hardware acceleration of transactional memory on commodity systems

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
OUTRIDER: efficient memory latency tolerance with decoupled strands

Proceedings of the 38th annual international symposium on Computer architecture
Chameleon: operating system support for dynamic processors

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
A case for unlimited watchpoints

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Complementing user-level coarse-grain parallelism with implicit speculative parallelism

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Distributed replay protocol for distributed uniprocessors

Proceedings of the 26th ACM international conference on Supercomputing
Revisiting hardware-assisted page walks for virtualized systems

Proceedings of the 39th Annual International Symposium on Computer Architecture
Mixed speculative multithreaded execution models

ACM Transactions on Architecture and Code Optimization (TACO)
Disjoint out-of-order execution processor

ACM Transactions on Architecture and Code Optimization (TACO)
Evaluation of Blue Gene/Q hardware support for transactional memories

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Tuning the continual flow pipeline architecture

Proceedings of the 27th international ACM conference on International conference on supercomputing
Virtual register renaming: energy efficient substrate for continual flow pipelines

Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Tuning the continual flow pipeline architecture with virtual register renaming

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents Simultaneous Speculative Threading (SST), which is a technique for creating high-performance area- and power-efficient cores for chip multiprocessors. SST hardware dynamically extracts two threads of execution from a single sequential program (one consisting of a load miss and its dependents, and the other consisting of the instructions that are independent of the load miss) and executes them in parallel. SST uses an efficient checkpointing mechanism to eliminate the need for complex and power-inefficient structures such as register renaming logic, reorder buffers, memory disambiguation buffers, and large issue windows. Simulations of certain SST implementations show 18% better per-thread performance on commercial benchmarks than larger and higher-powered out-of-order cores. Sun Microsystems' ROCK processor, which is the first processor to use SST cores, has been implemented and is scheduled to be commercially available in 2009.