Experiences with non-numeric applications on multithreaded architectures
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Hierarchical fuzzy configuration of implementation strategies
Proceedings of the 1999 ACM symposium on Applied computing
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
Compiling Several Classes of Communication Patterns on a Multithreaded Architecture
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Data locality sensitivity of multithreaded computations on a distributed-memory multiprocessor
CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research
High-Level Data Parallel Programming in PROMOTER
HIPS '97 Proceedings of the 1997 Workshop on High-Level Programming Models and Supportive Environments (HIPS '97)
Dynamic load balancing efficiently in a large-scale cluster
International Journal of High Performance Computing and Networking
An efficient dynamic load-balancing algorithm in a large-scale cluster
ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing
Hi-index | 0.00 |
Abstract: The sustained performance of superscalar microprocessors amounts to only a fraction of their peak performance rating. In parallel computers realized with them this discrepancy is even more dramatic. Reaching a satisfactory sustained performance for the single processor is mainly a compiler problem. The sustained performance of parallel computers depends also on other components of the architecture such as the interconnect and the operating system. It is shown how, through a combination of innovative architectural solutions, the sustained performance of a distributed memory parallel computer can be significantly improved. The key to effective latency hiding by overlapping communication and computation is the operating system. The programmability of such architectures can be enhanced by providing the programmer with parallelizing compilers and/or a global address space provided by virtual shared memory. All these measures have been incorporated in the MANNA computer described in the paper. Benchmark performance figures obtained with it are reported.