An architecture of a dataflow single chip processor
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Can dataflow subsume von Neumann computing?
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
The EPSILON-2 multiprocessor system
Journal of Parallel and Distributed Computing - Special issue: data-flow processing
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
T: a multithreaded massively parallel architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Processor coupling: integrating compile time and runtime scheduling for parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Monsoon: an explicit token-store architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Parallelism Control and Storage Management in Datarol PE
Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture - Information Processing '92, Volume 1 - Volume I
ICS '95 Proceedings of the 9th international conference on Supercomputing
A practical processor design for multithreading
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Proceedings of the 4th international conference on Computing frontiers
Multithreaded architecture for multimedia processing
Integrated Computer-Aided Engineering
A continuation-based noninterruptible multithreading processor architecture
The Journal of Supercomputing
SpamWatcher: a streaming social network analytic on the IBM wire-speed processor
Proceedings of the 5th ACM international conference on Distributed event-based system
Hi-index | 0.00 |
Latency, caused by remote memory access and remote procedure call, is one of the most serious problems in massively parallel computers. In order to eliminate the processors' idle time caused by these latencies, processors must perform fast context switching among fine-grain concurrent processes. In this paper, we propose a processor architecture, called Datarol-II, that promotes efficient fine-grain multi-thread execution by performing fast context switching among fine-grain concurrent processes. In the Datarol-II processor, an implicit register load/store mechanism is embedded in the execution pipeline in order to reduce memory access overhead caused by context switching. In order to reduce local memory access latency, a two-level hierarchical memory system and a load control mechanism are also introduced. We describe the Datarol-II processor architecture, and show its evaluation results.