The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors

Authors:
Venkata Krishnan;Josep Torrellas
Affiliations:
Alpha Development Group, Compaq Computer Corporation, Shrewsbury, Massachusetts 01545. Note: This work was done at the Computer Science Department of the University of Illinois at Urbana-Champaign ...;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801. torrellas@cs.uiuc.edu
Venue:
International Journal of Parallel Programming
Year:
2001

Citing 23
Cited 1

Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The anatomy of the register file in a multiscalar processor

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
The M-Machine multicomputer

Proceedings of the 28th annual international symposium on Microarchitecture
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References

IEEE Transactions on Computers
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences

Proceedings of the 24th annual international symposium on Computer architecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Task selection for a multiscalar processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Clustered speculative multithreaded processors

ICS '99 Proceedings of the 13th international conference on Supercomputing
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
Parallel Programming with Polaris

Computer
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Will Physical Scalability Sabotage Performance Gains?

Computer
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors

MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Speculative Versioning Cache

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
An Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques

Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Chip-multiprocessor (CMP) architectures are a promising design alternative to exploit the ever-increasing number of transistors that can be put on a die. To deliver high performance on applications that cannot be easily parallelized, CMPs can use additional support for speculatively executing the possibly data-dependent threads of an application. For cross-thread dependences that must be handled dynamically, the threads can be made to synchronize and communicate either at the register level or at the memory level. In the past, it has been unclear whether the higher hardware cost of register-level communication is cost-effective. In this paper, we show that the wide-issue dynamic processors that will soon populate CMPs, make fast communication a requirement for high performance. Consequently, we propose an effective hardware mechanism to support communication and synchronization of registers between on-chip processors. Our scheme adds enough support to enable register-level communication without specializing the architecture toward speculation much. Finally, our scheme allows the system to achieve near ideal performance.