Communications of the ACM - Special section on computer architecture
iPSC/2 system: a second generation hypercube
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Mark IIIfp hypercube concurrent processor architecture
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
A message passing coprocessor for distributed memory multicomputers
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Supporting systolic and memory communication in iWarp
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Message-Driven Processor Architecture, Version 11
Message-Driven Processor Architecture, Version 11
Performance of Various Computers Using Standard Linear Equations Software
Performance of Various Computers Using Standard Linear Equations Software
An efficient implementation scheme of concurrent object-oriented languages on stock multicomputers
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Multiple threads in cyclic register windows
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Improving AP1000 parallel computer performance with message communication
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
AP1000+: architectural support of PUT/GET interface for parallelizing compiler
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Proceedings of the tenth annual conference on Object-oriented programming systems, languages, and applications
Circuit-Switched Broadcasting in Torus Networks
IEEE Transactions on Parallel and Distributed Systems
Coherent network interfaces for fine-grain communication
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
ICS '96 Proceedings of the 10th international conference on Supercomputing
OMPI: optimizing MPI programs using partial evaluation
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Implementing Concurrent Object-Oriented Languages on Multicomputers
IEEE Parallel & Distributed Technology: Systems & Technology
Parallel N-ary Speculative Computation of Simulated Annealing
IEEE Transactions on Parallel and Distributed Systems
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Performance and modularity benefits of message-driven execution
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Low-latency communication is the key to achieving a high-performance parallel computer. In using state-of-the-art processors, we must take cache memory into account. This paper presents an architecture for low-latency message comunication and implementation, and performance evaluation.We developed a message controller (MSC) to support low-latency message passing communication for the AP1000, to minimize message handling overhead. MSC sends messages directly from cache memory and automatically receives messages in the circular buffer. We designed communication functions between cells and evaluated communication performance by running benchmark programs such as the Pingpong benchmark, the LINPACK benchmark, the SLALOM benchmark, and a solver using the scaled conjugate gradient method.