An architecture of a dataflow single chip processor
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Thread-based programming for the EM-4 hybrid dataflow machine
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
T: a multithreaded massively parallel architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The J-machine multicomputer: an architectural evaluation
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
EMC-Y: parallel processing element optimizing communication and computation
ICS '93 Proceedings of the 7th international conference on Supercomputing
AP1000+: architectural support of PUT/GET interface for parallelizing compiler
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Monsoon: an explicit token-store architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
APRIL: a processor architecture for multiprocessing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Proceedings of the 8th International Symposium on Parallel Processing
I-structures: Data structures for parallel computing
Proceedings of the Workshop on Graph Reduction
SPLASH: Stanford parallel applications for shared-memory
SPLASH: Stanford parallel applications for shared-memory
Polling watchdog: combining polling and interrupts for efficient message handling
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Fine-grain multithreading with the EM-X multiprocessor
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Highly efficient implementation of MPI point-to-point communication using remote memory operations
ICS '98 Proceedings of the 12th international conference on Supercomputing
25 years of the international symposia on Computer architecture (selected papers)
Fast speculative search engine on the highly parallel computer EM-X
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The Sisal project: real world functional programming
Compiler optimizations for scalable parallel systems
Tolerating communication latency through dynamic thread invocation in a multithreaded architecture
Compiler optimizations for scalable parallel systems
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
Design and implementation of FMPL, a fast message-passing library for remote memory operations
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Effects of Multithreading on Data and Workload Distribution for Distributed-Memory Multiprocessors
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
The Sisal Model of Functional Programming and its Implementation
PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Hi-index | 0.00 |
Latency tolerance is essential in achieving high performance on parallel computers for remote function calls and fine-grained remote memory accesses. EM-X supports interprocessor communication on an execution pipeline with small and simple packets. It can create a packet in one cycle, and receive a packet from the network in the on-chip buffer without interruption. EM-X invokes threads on packet arrival, minimizing the overhead of thread switching. It can tolerate communication latency by using efficient multi-threading and optimizing packet flow of fine grain communication. EM-X also supports the synchronization of two operands, direct remote memory read/write operations and flexible packet scheduling with priority. This paper describes distinctive features of the EM-X architecture and reports the performance of small synthetic programs and larger more realistic programs.