Java for high performance computing: assessment of current research and practice
PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
A configurable algorithm for parallel image-compositing applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Optimal bucket algorithms for large MPI collectives on torus interconnects
Proceedings of the 24th ACM International Conference on Supercomputing
Programming the Linpack benchmark for Roadrunner
IBM Journal of Research and Development
A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers
Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
Toward performance models of MPI implementations for understanding application scaling issues
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Design of efficient Java message-passing collectives on multi-core clusters
The Journal of Supercomputing
F-MPJ: scalable Java message-passing communications on parallel systems
The Journal of Supercomputing
Concurrency and Computation: Practice & Experience
On distributed file tree walk of parallel file systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Elemental: A New Framework for Distributed Memory Dense Matrix Computations
ACM Transactions on Mathematical Software (TOMS)
NUMA-aware image compositing on multi-GPU platform
The Visual Computer: International Journal of Computer Graphics
Hi-index | 0.02 |
We discuss the design and high-performance implementation of collective communications operations on distributed-memory computer architectures. Using a combination of known techniques (many of which were first proposed in the 1980s and early 1990s) along with careful exploitation of communication modes supported by MPI, we have developed implementations that have improved performance in most situations compared to those currently supported by public domain implementations of MPI such as MPICH. Performance results from a large Intel Xeon/Pentium 4 (R) processor cluster are included. Copyright © 2007 John Wiley & Sons, Ltd.