An architecture of a dataflow single chip processor
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Executing a Program on the MIT Tagged-Token Dataflow Architecture
IEEE Transactions on Computers
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Load balancing by function distribution on the EM-4 prototype
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
The DASH prototype: implementation and performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
APRIL: a processor architecture for multiprocessing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Super-threading: architectural and software mechanisms for optimizing parallel computation
ICS '93 Proceedings of the 7th international conference on Supercomputing
The EM-X parallel computer: architecture and basic performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multithreading with the EM-4 distributed-memory multiprocessor
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Fine-grain multithreading with the EM-X multiprocessor
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Highly efficient implementation of MPI point-to-point communication using remote memory operations
ICS '98 Proceedings of the 12th international conference on Supercomputing
Tolerating communication latency through dynamic thread invocation in a multithreaded architecture
Compiler optimizations for scalable parallel systems
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
Nonnumeric search results on the EM-4 distributed-memory multiprocessor
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Hi-index | 0.00 |
EMC-Y is a new processing element for highly parallel computers designed to achieve high performance parallel computation by fusing a dataflow mechanism and a von Neumann execution pipeline. We have already developed EMC-R, which is the processing element used in the EM-4 prototype. EMC-Y improves on EMC-R's packet communication performance, allowing it to tolerate a more network traffic. This paper presents the architecture of EMC-Y, concentrating on the principles of packet communication. EMC-Y uses an output packet buffer and optimal packet routing to improve the performance of packet sending and transferring. EMC-Y changes the memory access priority for input packet buffer operation to improve the performance of receiving packets. Since the EMC-Y processor not only improves the performance of packet input and output but also balances them, it can tolerate a large amount of traffic and can improve the execution performance. We evaluate the improvements of EMC-Y architecture using a clock level simulator. The results show that EMC-Y improves performance by 50% to 70% in several programs over EMC-R at the same clock speed.