A low-complexity microprocessor design with speculative pre-execution

Authors:
Won W. Ro;Jean-Luc Gaudiot
Affiliations:
School of Electrical and Electronic Engineering, Yonsei University, 134 Shinchon-dong, Seodaemun-gu, Seoul, South Korea;Department of Electrical Engineering and Computer Science, University of California, Irvine, USA
Venue:
Journal of Systems Architecture: the EUROMICRO Journal
Year:
2008

Citing 44
Cited 0

The effectiveness of decoupling

ICS '93 Proceedings of the 7th international conference on Supercomputing
Hybrid slicing: an approach for refining static slices using dynamic information

SIGSOFT '95 Proceedings of the 3rd ACM SIGSOFT symposium on Foundations of software engineering
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Compiler-based prefetching for recursive data structures

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Clustered speculative multithreaded processors

ICS '99 Proceedings of the 13th international conference on Supercomputing
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
A low-complexity issue logic

Proceedings of the 14th international conference on Supercomputing
Understanding the backward slices of performance degrading instructions

Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
On pipelining dynamic instruction scheduling logic

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the issue logic

ICS '01 Proceedings of the 15th international conference on Supercomputing
Slice-processors: an implementation of operation-based prediction

ICS '01 Proceedings of the 15th international conference on Supercomputing
Execution-based prediction using speculative slices

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Speculative precomputation: long-range prefetching of delinquent loads

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Data prefetching by dependence graph precomputation

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Post-pass binary adaptation for software-based speculative precomputation

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dynamic speculative precomputation

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Design and evaluation of compiler algorithms for pre-execution

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Simultaneous Multithreading: A Platform for Next-Generation Processors

IEEE Micro
The Alpha 21264 Microprocessor

IEEE Micro
Hierarchical Scheduling Windows

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A quantitative framework for automated pre-execution thread selection

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Using thread-level speculation to simplify manual parallelization

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay

Proceedings of the 30th annual international symposium on Computer architecture
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Speculative Data-Driven Multithreading

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Memory Latency-Tolerance Approaches for Itanium Processors: Out-of-Order Execution vs.Speculative Precomputation

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
A Clustered Approach to Multithreaded Processors

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Scaling the issue window with look-ahead latency prediction

Proceedings of the 18th annual international conference on Supercomputing
Exploring Wakeup-Free Instruction Scheduling

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Tornado warning: the perils of selective replay in multithreaded processors

Proceedings of the 19th annual international conference on Supercomputing
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Automatic Thread Extraction with Decoupled Software Pipelining

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Loop Scheduling with Complete Memory Latency Hiding on Multi-core Architecture

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Speculative pre-execution assisted by compiler (SPEAR)

Journal of Parallel and Distributed Computing - Special issue on parallel bioinspired algorithms
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
A low-complexity issue queue design with speculative pre-execution

HiPC'05 Proceedings of the 12th international conference on High Performance Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current superscalar architectures strongly depend on an instruction issue queue to achieve multiple instruction issue and out-of-order execution. However, the issue queue requires a centralized structure and mainly causes globally broadcasting operations to wakeup and select the instructions. Therefore, a large issue queue ultimately results in a low clock rate along with a high circuit complexity. In other words, the increasing demands for a larger issue queue correspondingly impose a significant burden on achieving a higher clock speed. This paper discusses our Speculative Pre-Execution Assisted by compileR (SPEAR), a low-complexity issue queue design. SPEAR is designed to manage the small window superscalar architecture more efficiently without increasing the window size. To this end, we have first recognized that the long memory latency is one of the factors that demand a large window, and we aim at achieving early execution of the miss-causing load instructions using another hierarchy of an issue queue. We pre-execute those miss-causing instructions speculatively as an additional prefetching thread. Simulation results show that the SPEAR design achieves performance comparable to or even better than what would be obtained in superscalar architectures with a large issue queue. However, SPEAR is designed with smaller issue queues which consequently can be implemented with low hardware complexity and high clock speed.