Disjoint out-of-order execution processor

Authors:
Mageda Sharafeddine;Komal Jothi;Haitham Akkary
Affiliations:
American University of Beirut, Lebanon;American University of Beirut, Lebanon;American University of Beirut, Lebanon
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2012

Citing 60
Cited 2

An efficient method of computing static single assignment form

POPL '89 Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Branch history table prediction of moving target branches due to subroutine returns

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The expandable split window paradigm for exploiting fine-grain parallelsim

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The multiscalar architecture

The multiscalar architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
Target prediction for indirect jumps

Proceedings of the 24th annual international symposium on Computer architecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Speculative multithreaded processors

ICS '98 Proceedings of the 12th international conference on Supercomputing
Memory dependence prediction using store sets

Proceedings of the 25th annual international symposium on Computer architecture
Task selection for a multiscalar processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A dynamic multithreading processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Improving the performance of speculatively parallel applications on the Hydra CMP

ICS '99 Proceedings of the 13th international conference on Supercomputing
The limits of instruction level parallelism in SPEC95 applications

ACM SIGARCH Computer Architecture News - Special issue on Interact-3 workshop
The Superthreaded Processor Architecture

IEEE Transactions on Computers
Value prediction for speculative multithreaded architectures

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Architecture of the Atlas Chip-Multiprocessor: Dynamically Parallelizing Irregular Applications

IEEE Transactions on Computers
Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Proceedings of the 27th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Skipper: a microarchitecture for exploiting control-flow independence

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Compiler optimization of scalar value communication between speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Tuning the Pentium Pro Microarchitecture

IEEE Micro
Cherry: checkpointed early resource recycling in out-of-order microprocessors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Master/slave speculative parallelization

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Control Flow Speculation in Multiscalar Processors

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Performance Study of a Concurrent Multithreaded Processor

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Speculative Versioning Cache

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Exploiting Method-Level Parallelism in Single-Threaded Java Programs

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
On Dynamic Speculative Thread Partitioning and the MEM-Slicing Algorithm

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
A Quantitative Assessment of Thread-Level Speculation Techniques

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Thread-Spawning Schemes for Speculative Multithreading

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization for Multiprocessors

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Improving Value Communication for Thread-Level Speculation

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Continual flow pipelines

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
A Minimal Dual-Core Speculative Multi-Threading Architecture

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Control Flow Optimization Via Dynamic Reconvergence Prediction

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Exposing speculative thread parallelism in SPEC2000

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Scalable Load and Store Processing in Latency Tolerant Processors

Proceedings of the 32nd annual international symposium on Computer Architecture
Out-of-Order Commit Processors

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Reducing Branch Misprediction Penalty via Selective Branch Recovery

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Kilo-Instruction Processors: Overcoming the Memory Wall

IEEE Micro
Compilers: Principles, Techniques, and Tools (2nd Edition)

Compilers: Principles, Techniques, and Tools (2nd Edition)
Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Transparent control independence (TCI)

Proceedings of the 34th annual international symposium on Computer architecture
Measuring the Parallelism Available for Very Long Instruction Word Architectures

IEEE Transactions on Computers
The Inhibition of Potential Parallelism by Conditional Jumps

IEEE Transactions on Computers
Accurate branch prediction for short threads

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor

Proceedings of the 36th annual international symposium on Computer architecture
WiDGET: Wisconsin decoupled grid execution tiles

Proceedings of the 37th annual international symposium on Computer architecture
Forwardflow: a scalable core for power-constrained CMPs

Proceedings of the 37th annual international symposium on Computer architecture

Tuning the continual flow pipeline architecture with virtual register renaming

ACM Transactions on Architecture and Code Optimization (TACO)
A thread partitioning approach for speculative multithreading

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-performance superscalar architectures used to exploit instruction level parallelism in single-thread applications have become too complex and power hungry for the multicore processors era. We propose a new architecture that uses multiple small latency-tolerant out-of-order cores to improve single-thread performance. Improving single-thread performance with multiple small out-of-order cores allows designers to place more of these cores on the same die. Consequently, emerging highly parallel applications can take full advantage of the multicore parallel hardware without sacrificing performance of inherently serial and hard to parallelize applications. Our architecture combines speculative multithreading (SpMT) with checkpoint recovery and continual flow pipeline architectures. It splits single-thread program execution into disjoint control and data threads that execute concurrently on multiple cooperating small and latency-tolerant out-of-order cores. Hence we call this style of execution Disjoint Out-of-Order Execution (DOE). DOE uses latency tolerance to overcome performance issues of SpMT caused by interthread data dependences. To evaluate this architecture, we have developed a microarchitecture performance model of DOE based on PTLSim, a simulation infrastructure of the x86 instruction set architecture. We evaluate the potential performance of DOE processor architecture using a simple heuristic to fork control independent threads in hardware at the target addresses of future procedure return instructions. Using applications from SpecInt 2000, we study DOE under ideal as well as realistic architectural constraints. We discuss the performance impact of key DOE architecture and application variables such as number of cores, interthread data dependences, intercore data communication delay, buffers capacity, and branch mispredictions. Without any DOE specific compiler optimizations, our results show that DOE outperforms conventional SpMT architectures by 15%, on average. We also show that DOE with four small cores can perform on average equally well to a large superscalar core, consuming about the same power. Most importantly, DOE improves throughput performance by a significant amount over a large superscalar core, up to 2.5 times, when running multitasking applications.