Evaluation of the WM architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
MISC: a Multiple Instruction Stream Computer
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The effectiveness of decoupling
ICS '93 Proceedings of the 7th international conference on Supercomputing
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A comparision of superscalar and decoupled access/execute architectures
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
A comparison of data prefetching on an access decoupled and superscalar machine
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Performance modeling and code partitioning for the DS architecture
Proceedings of the 25th annual international symposium on Computer architecture
Clustered speculative multithreaded processors
ICS '99 Proceedings of the 13th international conference on Supercomputing
A Chip-Multiprocessor Architecture with Speculative Multithreading
IEEE Transactions on Computers
PIPE: a VLSI decoupled architecture
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Proceedings of the 14th international conference on Supercomputing
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Speculative precomputation: long-range prefetching of delinquent loads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The Alpha 21264 Microprocessor
IEEE Micro
Memory Latency Effects in Decoupled Architectures
IEEE Transactions on Computers
Hierarchical Scheduling Windows
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Decoupled access/execute computer architectures
ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
HiDISC: A Decoupled Architecture for Data-Intensive Applications
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay
Proceedings of the 30th annual international symposium on Computer architecture
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Speculative Data-Driven Multithreading
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
A Clustered Approach to Multithreaded Processors
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Exploring Wakeup-Free Instruction Scheduling
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Tornado warning: the perils of selective replay in multithreaded processors
Proceedings of the 19th annual international conference on Supercomputing
Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Design and effectiveness of small-sized decoupled dispatch queues
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Hi-index | 0.00 |
Continuing demands for high degrees of Instruction Level Parallelism (ILP) require large dispatch queues (or centralized reservation stations) in modern superscalar microprocessors. However, such large dispatch queues are inevitably accompanied by high circuit complexity which would correspondingly limit the pipeline clock rates. In other words, increasing the size of the dispatch queue ultimately hinders attempts at increasing the clock speed. This is due to the fact that most of today's designs are based upon a centralized dispatch queue which itself depends on globally broadcasting operations to wakeup and select the ready instructions. As an alternative to this conventional design, we propose the design of hierarchically distributed dispatch queues, based on the access/execute decoupled architectures. Simulation results based on 14 data intensive benchmarks show that while our DDQ (Decoupled Dispatch Queues) design achieves levels of performance which are comparable to what would be obtained in a superscalar machine with a large dispatch queue, our approach can be designed with small, distributed dispatch queues which consequently can be implemented with low hardware complexity and high clock rates.