HEP SISAL: parallel functional programming
on Parallel MIMD computation: HEP supercomputer and its applications
How to write parallel programs: a first course
How to write parallel programs: a first course
Micro benchmark analysis of the KSR1
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Journal of Computational Chemistry
Efficient Barriers for Distributed Shared Memory Computers
Proceedings of the 8th International Symposium on Parallel Processing
Optimizing collective I/O performance on parallel computers: a multisystem study
ICS '97 Proceedings of the 11th international conference on Supercomputing
Performance implications of communication mechanisms in all-software global address space systems
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Simplification of array access patterns for compiler optimizations
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Shared Memory Programming in Metacomputing Environments: The Global Array Approach
The Journal of Supercomputing - Special issue: high performance distributed computing
An Advanced Compiler Framework for Non-Cache-Coherent Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Emulating Shared Memory to Simplify Distributed-Memory Programming
IEEE Computational Science & Engineering
Protocols and Strategies for Optimizing Performance of Remote Memory Operations on Clusters
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Performance Benefits of NIC-Based Barrier on Myrinet/GM
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A Multiprotocol Communication Support for the Global Address Space Programming Model on the IBM SP
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
Distributed dynamic hash tables using IBM LAPI
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Disk Resident Arrays: An Array-Oriented I/O Library for Out-Of-Core Computations
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Shared Memory NUMA Programming on I-WAY
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Compiler Techniques for the Distribution of Data and Computation
IEEE Transactions on Parallel and Distributed Systems
Sourcebook of parallel computing
Load balancing of molecular dynamics simulation with NWChem
IBM Systems Journal - Deep computing for the life sciences
SIMBEX: a portal for the a priori simulation of crossed beam experiments
Future Generation Computer Systems - Special issue: Computational chemistry and molecular dynamics
Early Evaluation of the Cray X1
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Enabling the Efficient Use of SMP Clusters: The GAMESS/DDI Model
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Distributed Shared Arrays: An Integration of Message Passing and Multithreading on SMP Clusters
The Journal of Supercomputing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 4 - Volume 05
An Evaluation of Two Implementation Strategies for Optimizing One-Sided Atomic Reduction
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
International Journal of High Performance Computing Applications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Programming for parallelism and locality with hierarchically tiled arrays
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit
International Journal of High Performance Computing Applications
High Performance Remote Memory Access Communication: The Armci Approach
International Journal of High Performance Computing Applications
Layout transformation support for the disk resident arrays framework
The Journal of Supercomputing
Systems research challenges: a scale-out perspective
IBM Journal of Research and Development
Experiences with component-oriented technologies in nuclear power plant simulators
Software—Practice & Experience
Using the GA and TAO toolkits for solving large-scale optimization problems on parallel computers
ACM Transactions on Mathematical Software (TOMS)
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
An asymmetric distributed shared memory model for heterogeneous parallel systems
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Improving the performance of OpenMP by array privatization
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Hybrid parallel programming with MPI and unified parallel C
Proceedings of the 7th ACM international conference on Computing frontiers
Selective Recovery from Failures in a Task Parallel Programming Model
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Service Oriented Approach to High Performance Scientific Computing
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
An extensible global address space framework with decoupled task and data abstractions
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Extensible PGAS semantics for C++
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Designing a common communication subsystem
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Empirical performance-model driven data layout optimization
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Efficient layout transformation for disk-based multidimensional arrays
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
A CCA-compliant nuclear power plant simulator kernel
CBSE'05 Proceedings of the 8th international conference on Component-Based Software Engineering
Performance characterization of global address space applications: a case study with NWChem
Concurrency and Computation: Practice & Experience
CUDASA: compute unified device and systems architecture
EG PGV'08 Proceedings of the 8th Eurographics conference on Parallel Graphics and Visualization
Inspector/executor load balancing algorithms for block-sparse tensor contractions
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
Portability, efficiency, and ease of coding are all important considerations in choosing the programming model for a scalable parallel application. The message-passing programming model is widely used because of its portability, yet some applications are too complex to code in it while also trying to maintain a balanced computation load and avoid redundant computations. The shared-memory programming model simplifies coding, but it is not portable and often provides little control over interprocessor data transfer costs. This paper describes a new approach, called Global Arrays (GA), that combines the better features of both other models, leading to both simple coding and efficient execution. The key concept of GA is that it provides a portable interface through which each process in a MIMD parallel program can asynchronously access logical blocks of physically distributed matrices, with no need for explicit cooperation by other processes. We have implemented GA libraries on a variety of computer systems, including the Intel DELTA and Paragon, the IBM SP-1 (all message-passers), the Kendall Square KSR-2 (a nonuniform access shared-memory machine), and networks of Unix workstations. We discuss the design and implementation of these libraries, report their performance, illustrate the use of GA in the context of computational chemistry applications, and describe the use of a GA performance visualization tool.