Type architectures, shared memory, and the corollary of modest potential
Annual review of computer science vol. 1, 1986
A bridging model for parallel computation
Communications of the ACM
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
CHARM++: a portable concurrent object oriented system based on C++
OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Virtual memory mapped network interface for the SHRIMP multicomputer
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Meiko CS-2 interconnect Elan-Elite design
Parallel Computing - Special double issue: SUPRENUM and GENESIS
Design of the AlphaServer multiprocessor server systems
Digital Technical Journal
DXML: a high-performance scientific subroutine library
Digital Technical Journal
Support for distributed dynamic data structures in C++
Support for distributed dynamic data structures in C++
Decoupled hardware support for distributed shared memory
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Efficient management of parallelism in object-oriented numerical software libraries
Modern software tools for scientific computing
Efficient run-time support for irregular block-structured applications
Journal of Parallel and Distributed Computing - Special issue on irregular problems in supercomputing applications
Multi-protocol active messages on a cluster of SMP's
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Compositional C++: Compositional Parallel Programming
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Run-Time Support for Multi-tier Programming of Block-Structured Applications on SMP Clusters
ISCOPE '97 Proceedings of the Scientific Computing in Object-Oriented Parallel Environments
Message Proxies for Efficient, Protected Communication on SMP Clusters
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Efficient Layering for High Speed Communication: Fast Messages 2.x
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Minimizing overhead in parallel algorithms through overlapping communication/computation
Minimizing overhead in parallel algorithms through overlapping communication/computation
Irregular Coarse-Grain Data Parallelism under LPARX
Scientific Programming
A Programming Methodology for Dual-Tier Multicomputers
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
Performance Tradeoffs in Multi-tier Formulation of a Finite Difference Method
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
The Data Mover: A Machine-Independent Abstraction for Managing Customized Data Motion
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Hierarchical Partitioning Techniques for Structured Adaptive Mesh Refinement Applications
The Journal of Supercomputing
SCALLOP: A Highly Scalable Parallel Poisson Solver in Three Dimensions
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Out-of-Core and Pipeline Techniques for Wavefront Algorithms
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
High Performance Remote Memory Access Communication: The Armci Approach
International Journal of High Performance Computing Applications
The Journal of Supercomputing
International Journal of High Performance Computing and Networking
Hi-index | 0.00 |
Hierarchically organized multicomputers such as SMP clusters offer new opportunities and new challenges for high-performance computation, but realizing their full potential remains a formidable task. We present a hierarchical model of communication targeted to block-structured, bulk-synchronous applications running on dedicated clusters of symmetric multiprocessors. Our model supports node-level rather processor-level communication as the fundamental operation, and is optimized for aggregate patterns of regular section moves rather than point-to-point messages. These two capabilities work synergistically. They provide flexibility in overlapping communication and overcome deficiencies in the underlying communication layer on systems where inter-node communication bandwidth is at a premium. We have implemented our communication model in the KeLP2.0 run time library. We present empirical results for five applications running on a cluster of Digital AlphaServer 2100's. Four of the applications were able to overlap communication on a system which does not support overlap via non-blocking message passing using MPI. Overall performance improvements due to our overlap strategy ranged from 12% to 28%.