Design and performance evaluation of a multithreaded architecture

Authors:
R. Govindarajan;S. S. Nemawarkar;P. LeNir
Affiliations:
-;-;-
Venue:
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Year:
1995

Citing 11
Cited 8

Toward a dataflow/von Neumann hybrid architecture

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
I-structures: data structures for parallel computing

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler-Directed Cache Management in Multiprocessors

Computer
Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An elementary processor architecture with simultaneous instruction issuing from multiple threads

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
T: a multithreaded massively parallel architecture

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Processor coupling: integrating compile time and runtime scheduling for parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Monsoon: an explicit token-store architecture

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
APRIL: a processor architecture for multiprocessing

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The Message-Driven Processor: A Multicomputer Processing Node with Efficient Mechanisms

IEEE Micro

Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Improving single-process performance with multithreaded processors

ICS '96 Proceedings of the 10th international conference on Supercomputing
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Timed Petri net models of multithreaded multiprocessor architectures

PNPM '97 Proceedings of the 6th International Workshop on Petri Nets and Performance Models
Performance and modularity benefits of message-driven execution

Journal of Parallel and Distributed Computing
Non-strict execution in parallel and distributed computing

International Journal of Parallel Programming
Performance limitations of block-multithreaded distributed-memory systems

Winter Simulation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multithreaded architectures have the ability to tolerate long memory latencies and unpredictable synchronization delays. We propose a multithreaded architecture that is capable of exploiting both coarse-grain parallelism, and fine-grain instruction level parallelism in a program. Instruction-level parallelism is exploited by grouping instructions from a number of active threads at runtime. The architecture supports multiple resident activations to improve the extent of locality exploited. Further, a distributed data structure cache organization is proposed to reduce both the network: traffic and the latency in accessing remote locations. Initial performance evaluation using discrete-event simulation indicates that the architecture is capable of achieving very high processor throughput. The introduction of the data structure cache reduces the network latency significantly. The impact of various cache organizations on the performance of the architecture is also discussed in this paper.