Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
A survey of processors with explicit multithreading
ACM Computing Surveys (CSUR)
An Efficient Way of Passing of Data in a Multithreaded Scheduled Dataflow Architecture
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Fairness and Throughput in Switch on Event Multithreading
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Fairness enforcement in switch on event multithreading
ACM Transactions on Architecture and Code Optimization (TACO)
A hybrid closed queuing network approach to model dataflow in networked distributed processors
Computer Communications
A closed queuing network model with multiple servers for multi-threaded architecture
Computer Communications
A hybrid closed queuing network model for multi-threaded dataflow architecture
Computers and Electrical Engineering
Hi-index | 0.01 |
The multithreaded processor - called Rhamma - uses a fast context switch to bridge latencies caused by memory accesses or by synchronization operations. Load/store, synchronization, and execution operations of different threads of control are executed simultaneously by appropriate functional units. A fast context switch is performed whenever a functional unit comes across an operation that is destined for another unit. The overall performance depends on the speed of the context switch. We present two techniques to reduce the context switch cost to at most one processor cycle: A context switch is explicitly coded in the opcode, and a context switch buffer is used. The load/store unit shows up as the principal bottleneck. We evaluate four implementation alternatives of the load/store unit to increase processor performance.