Future scientific programming on parallel machines
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The NAS parallel benchmarks—summary and preliminary results
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiler-controlled multithreading for lenient parallel languages
Proceedings of the 5th ACM conference on Functional programming languages and computer architecture
SPLASH: Stanford parallel applications for shared-memory
SPLASH: Stanford parallel applications for shared-memory
The Implementation of a Threaded Abstract Machine
The Implementation of a Threaded Abstract Machine
T: integrated building blocks for parallel computing
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
POPL '95 Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Evaluating the locality benefits of active messages
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Remote queues: exposing message queues for optimization and atomicity
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
How much non-strictness do lenient programs require?
FPCA '95 Proceedings of the seventh international conference on Functional programming languages and computer architecture
A design study of the EARTH multiprocessor
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Polling watchdog: combining polling and interrupts for efficient message handling
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
25 years of the international symposia on Computer architecture (selected papers)
The Stanford FLASH multiprocessor
25 years of the international symposia on Computer architecture (selected papers)
Tempest and typhoon: user-level shared memory
25 years of the international symposia on Computer architecture (selected papers)
Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes
IEEE Transactions on Computers
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
HeapMon: a helper-thread approach to programmable, automatic, and low-overhead memory bug detection
IBM Journal of Research and Development
Hi-index | 0.00 |
This paper uses an abstract machine approach to compare the mechanisms of two parallel machines: the J-Machine and the CM-5. High-level parallel programs are translated by a single optimizing compiler to a fine-grained abstract parallel machine, TAM. A final compilation step is unique to each machine and optimizes for specifics of the architecture. By determining the cost of the primitives and weighting them by their dynamic frequency in parallel programs, we quantify the effectiveness of the following mechanisms individually and in combination. Efficient processor/network coupling proves valuable. Message dispatch is found to be less valuable without atomic operations that allow the scheduling levels to cooperate. Multiple hardware contexts are of small value when the contexts cooperate and the compiler can partition the register set. Tagged memory provides little gain. Finally, the performance of the overall system is strongly influenced by the performance of the memory system and the frequency of control operations.