The EM-X parallel computer: architecture and basic performance

Authors:
Yuetsu Kodama;Hirohumi Sakane;Mitsuhisa Sato;Hayato Yamana;Shuichi Sakai;Yoshinori Yamaguchi
Affiliations:
Electrotechnical Laboratory, 1-1-4, Umezono, Tsukuba, Ibaraki 305 Japan;Electrotechnical Laboratory, 1-1-4, Umezono, Tsukuba, Ibaraki 305 Japan;Electrotechnical Laboratory, 1-1-4, Umezono, Tsukuba, Ibaraki 305 Japan;Electrotechnical Laboratory, 1-1-4, Umezono, Tsukuba, Ibaraki 305 Japan;Real World Computing Partnership, 1-6-1, Takezono, Tsukuba, Ibaraki 305 Japan;Electrotechnical Laboratory, 1-1-4, Umezono, Tsukuba, Ibaraki 305 Japan
Venue:
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Year:
1995

Citing 11
Cited 13

An architecture of a dataflow single chip processor

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Thread-based programming for the EM-4 hybrid dataflow machine

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
T: a multithreaded massively parallel architecture

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The J-machine multicomputer: an architectural evaluation

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
EMC-Y: parallel processing element optimizing communication and computation

ICS '93 Proceedings of the 7th international conference on Supercomputing
AP1000+: architectural support of PUT/GET interface for parallelizing compiler

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Monsoon: an explicit token-store architecture

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
APRIL: a processor architecture for multiprocessing

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Experience with Executing Shared Memory Programs using Fine-Grain Communication and Multithreading in EM-4

Proceedings of the 8th International Symposium on Parallel Processing
I-structures: Data structures for parallel computing

Proceedings of the Workshop on Graph Reduction
SPLASH: Stanford parallel applications for shared-memory

SPLASH: Stanford parallel applications for shared-memory

Polling watchdog: combining polling and interrupts for efficient message handling

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Fine-grain multithreading with the EM-X multiprocessor

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Highly efficient implementation of MPI point-to-point communication using remote memory operations

ICS '98 Proceedings of the 12th international conference on Supercomputing
Retrospective: the J-machine

25 years of the international symposia on Computer architecture (selected papers)
Fast speculative search engine on the highly parallel computer EM-X

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The Sisal project: real world functional programming

Compiler optimizations for scalable parallel systems
Tolerating communication latency through dynamic thread invocation in a multithreaded architecture

Compiler optimizations for scalable parallel systems
Asynchrony in parallel computing: from dataflow to multithreading

Progress in computer research
Asynchrony in parallel computing: from dataflow to multithreading

Progress in computer research
Design and implementation of FMPL, a fast message-passing library for remote memory operations

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Effects of Multithreading on Data and Workload Distribution for Distributed-Memory Multiprocessors

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Experience with Fine-Grain Communication in EM-X Multiprocessor for Parallel Sparse Matrix Computation

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
The Sisal Model of Functional Programming and its Implementation

PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Latency tolerance is essential in achieving high performance on parallel computers for remote function calls and fine-grained remote memory accesses. EM-X supports interprocessor communication on an execution pipeline with small and simple packets. It can create a packet in one cycle, and receive a packet from the network in the on-chip buffer without interruption. EM-X invokes threads on packet arrival, minimizing the overhead of thread switching. It can tolerate communication latency by using efficient multi-threading and optimizing packet flow of fine grain communication. EM-X also supports the synchronization of two operands, direct remote memory read/write operations and flexible packet scheduling with priority. This paper describes distinctive features of the EM-X architecture and reports the performance of small synthetic programs and larger more realistic programs.