A complexity-effective microprocessor design with decoupled dispatch queues and prefetching

Authors:
Won W. Ro;Jean-Luc Gaudiot
Affiliations:
School of Electrical and Electronic Engineering, Yonsei University, 134 Shinchon-dong, Seodaemun-gu, Seoul 120-749, Republic of Korea;Department of Electrical Engineering and Computer Science, University of California, Irvine, USA
Venue:
Parallel Computing
Year:
2009

Citing 31
Cited 0

Evaluation of the WM architecture

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
MISC: a Multiple Instruction Stream Computer

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The effectiveness of decoupling

ICS '93 Proceedings of the 7th international conference on Supercomputing
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A comparision of superscalar and decoupled access/execute architectures

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
A comparison of data prefetching on an access decoupled and superscalar machine

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Performance modeling and code partitioning for the DS architecture

Proceedings of the 25th annual international symposium on Computer architecture
Clustered speculative multithreaded processors

ICS '99 Proceedings of the 13th international conference on Supercomputing
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
PIPE: a VLSI decoupled architecture

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
A low-complexity issue logic

Proceedings of the 14th international conference on Supercomputing
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Speculative precomputation: long-range prefetching of delinquent loads

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The Alpha 21264 Microprocessor

IEEE Micro
Memory Latency Effects in Decoupled Architectures

IEEE Transactions on Computers
Hierarchical Scheduling Windows

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Decoupled access/execute computer architectures

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
HiDISC: A Decoupled Architecture for Data-Intensive Applications

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay

Proceedings of the 30th annual international symposium on Computer architecture
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Speculative Data-Driven Multithreading

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
A Clustered Approach to Multithreaded Processors

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Exploring Wakeup-Free Instruction Scheduling

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Tornado warning: the perils of selective replay in multithreaded processors

Proceedings of the 19th annual international conference on Supercomputing
Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Design and effectiveness of small-sized decoupled dispatch queues

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Continuing demands for high degrees of Instruction Level Parallelism (ILP) require large dispatch queues (or centralized reservation stations) in modern superscalar microprocessors. However, such large dispatch queues are inevitably accompanied by high circuit complexity which would correspondingly limit the pipeline clock rates. In other words, increasing the size of the dispatch queue ultimately hinders attempts at increasing the clock speed. This is due to the fact that most of today's designs are based upon a centralized dispatch queue which itself depends on globally broadcasting operations to wakeup and select the ready instructions. As an alternative to this conventional design, we propose the design of hierarchically distributed dispatch queues, based on the access/execute decoupled architectures. Simulation results based on 14 data intensive benchmarks show that while our DDQ (Decoupled Dispatch Queues) design achieves levels of performance which are comparable to what would be obtained in a superscalar machine with a large dispatch queue, our approach can be designed with small, distributed dispatch queues which consequently can be implemented with low hardware complexity and high clock rates.