Architecture of a message-driven processor
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
The architecture and programming of the Ametek series 2010 multicomputer
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
iPSC/2 system: a second generation hypercube
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
An architecture of a dataflow single chip processor
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Interprocessor communication speed and performance in distributed-memory parallel processors
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Performance Analysis of k-ary n-cube Interconnection Networks
IEEE Transactions on Computers
A message passing coprocessor for distributed memory multicomputers
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Comparative evaluation of latency reducing and tolerating techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Data-parallel programming on MIMD computers
Data-parallel programming on MIMD computers
T: a multithreaded massively parallel architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Improved multithreading techniques for hiding communication latency in multiprocessors
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Low-latency message communication support for the AP1000
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The impact of communication locality on large-scale multiprocessor performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Evaluation of compiler generated parallel programs on three multicomputers
ICS '92 Proceedings of the 6th international conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Design and Implementation of an Interconnection Network for the AP1000
Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture - Information Processing '92, Volume 1 - Volume I
AP1000+: architectural support of PUT/GET interface for parallelizing compiler
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Performance prediction of parallel systems with scalable specifications—methodology and case study
ACM SIGMETRICS Performance Evaluation Review
ICS '96 Proceedings of the 10th international conference on Supercomputing
A Loop Transformation Algorithm for Communication Overlapping
International Journal of Parallel Programming - Special issue on international symposium on high performance computing 1997, part I
Parallel N-ary Speculative Computation of Simulated Annealing
IEEE Transactions on Parallel and Distributed Systems
Event-Based Study of the Effect of Execution Environments on Parallel Program Performance
MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Hi-index | 0.00 |
The performance of message-passing applications depends on cpu speed, communication throughput and latency, and message handling overhead. In this paper we investigate the effect of varying these parameters and applying techniques to reduce message handling overhead on the execution efficiency of ten different applications. Using a message level simulator set up for the architecture of the AP1000, we showed that improving communication performance, especially message handling, improves total performance. If a cpu that is 32 times faster is provided, the total performance increases by less than ten times unless message handling overhead is reduced. Overlapping computation with message reception improves performance significantly. We also discuss how to improve the AP1000 architecture.