On the potential of latency tolerant execution in speculative multithreading

Authors:
Haitham Akkary;Komal Jothi;Renjith Retnamma;Satyanarayana Nekkalapu;Doug Hall;Shahrokh Shahidzadeh
Affiliations:
American University of Beirut;Portland State University;Portland State University;Portland State University;Portland State University;Intel Corporation
Venue:
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Year:
2008

Citing 37
Cited 4

The expandable split window paradigm for exploiting fine-grain parallelsim

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The multiscalar architecture

The multiscalar architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Speculative multithreaded processors

ICS '98 Proceedings of the 12th international conference on Supercomputing
Task selection for a multiscalar processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Improving the performance of speculatively parallel applications on the Hydra CMP

ICS '99 Proceedings of the 13th international conference on Supercomputing
The Superthreaded Processor Architecture

IEEE Transactions on Computers
Value prediction for speculative multithreaded architectures

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Architecture of the Atlas Chip-Multiprocessor: Dynamically Parallelizing Irregular Applications

IEEE Transactions on Computers
Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Proceedings of the 27th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Compiler optimization of scalar value communication between speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Tuning the Pentium Pro Microarchitecture

IEEE Micro
Cherry: checkpointed early resource recycling in out-of-order microprocessors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Control Flow Speculation in Multiscalar Processors

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Performance Study of a Concurrent Multithreaded Processor

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
On Dynamic Speculative Thread Partitioning and the MEM-Slicing Algorithm

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
A Quantitative Assessment of Thread-Level Speculation Techniques

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Thread-Spawning Schemes for Speculative Multithreading

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Continual flow pipelines

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
A Minimal Dual-Core Speculative Multi-Threading Architecture

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Control Flow Optimization Via Dynamic Reconvergence Prediction

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Exposing speculative thread parallelism in SPEC2000

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Scalable Load and Store Processing in Latency Tolerant Processors

Proceedings of the 32nd annual international symposium on Computer Architecture
Out-of-Order Commit Processors

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Kilo-Instruction Processors: Overcoming the Memory Wall

IEEE Micro
Transparent control independence (TCI)

Proceedings of the 34th annual international symposium on Computer architecture
Measuring the Parallelism Available for Very Long Instruction Word Architectures

IEEE Transactions on Computers
The Inhibition of Potential Parallelism by Conditional Jumps

IEEE Transactions on Computers

The potential of using dynamic information flow analysis in data value prediction

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Analysis of execution efficiency in the microthreaded processor UTLEON3

ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value Prediction

ACM Transactions on Architecture and Code Optimization (TACO)
Criticality guided energy aware speculation for speculative multithreaded processors

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

High performance superscalar architectures used to exploit instruction level parallelism in single-thread applications have become too complex and too power hungry for the many-core processors era. We propose a new architecture that uses multiple latency-tolerant in-order cores to improve single-thread performance, without requiring complex out-of-order execution hardware or large, power hungry register files and instruction buffers. Using simple cores to provide improved single-thread performance for conventional difficult-to-parallelize applications allows designers to place many more of these cores on the same die. Consequently, emerging highly parallel applications can take full advantage of the many-core parallel hardware without sacrificing performance of inherently serial applications. Our architecture splits single-thread program execution into disjoint control and data threads that execute concurrently on multiple latency-tolerant in-order cores. Hence we call this style of execution Disjoint Out-of-Order Execution (DOE). DOE is a novel implementation of Speculative Multithreading (SpMT). It uses latency tolerance to overcome performance issues of SpMT caused by load imbalance and inter-thread data communication delays. Using control independence prediction hardware to spawn threads, we simulate the potential performance of DOE on a subset of Spec2000 integer benchmarks under various parallelism scenarios and for DOE configurations of 2, 4, 6 and 8 single-issue latency tolerant cores.