Allocating Independent Subtasks on Parallel Processors
IEEE Transactions on Software Engineering
Memory access buffering in multiprocessors
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Correct memory operation of cache-based multiprocessors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Scan primitives for vector computers
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Data-parallel programming on MIMD computers
Data-parallel programming on MIMD computers
Static analysis to reduce synchronization costs in data-parallel programs
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Reducing synchronization overhead in parallel simulation
PADS '96 Proceedings of the tenth workshop on Parallel and distributed simulation
Global arrays: a nonuniform memory access programming model for high-performance computers
The Journal of Supercomputing
Optimizing collective I/O performance on parallel computers: a multisystem study
ICS '97 Proceedings of the 11th international conference on Supercomputing
Shared Memory Programming in Metacomputing Environments: The Global Array Approach
The Journal of Supercomputing - Special issue: high performance distributed computing
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
A programmer's guide to ZPL
Computational chemistry on Fujitsu vector-parallel processors: hardware and programming environment
Parallel Computing - computational chemistry
Performance of hybrid message-passing and shared-memory parallelism for discrete element modeling
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A comparison of three programming models for adaptive applications on the Origin2000
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Parallel Computing in Computational Chemistry
Parallel Computing in Computational Chemistry
Compile Time Barrier Synchronization Minimization
IEEE Transactions on Parallel and Distributed Systems
Terascale spectral element dynamical core for atmospheric general circulation models
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Global arrays: a portable "shared-memory" programming model for distributed memory computers
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
One-Sided Communication on Clusters with Myrinet
Cluster Computing
Fast, Adaptively Refined Computational Elements in 3D
ICCS '02 Proceedings of the International Conference on Computational Science-Part III
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Protocols and Strategies for Optimizing Performance of Remote Memory Operations on Clusters
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
Towards OpenMP Execution on Software Distributed Shared Memory Systems
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Overture: An Object-Oriented Framework for Solving Partial Differential Equations
ISCOPE '97 Proceedings of the Scientific Computing in Object-Oriented Parallel Environments
COMB: A Portable Benchmark Suite for Assessing MPI Overlap
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Disk Resident Arrays: An Array-Oriented I/O Library for Out-Of-Core Computations
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Dynamically Controlling False Sharing in Distributed Shared Memory
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Shared Memory NUMA Programming on I-WAY
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Toward a Common Component Architecture for High-Performance Scientific Computing
HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
Gigapixel-Size Real-Time Interactive Image Processing with Parallel Computers
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Optimizing Synchronization Operations for Remote Memory Communication Systems
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors
Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Generalized portable shmem library for high performance computing
Generalized portable shmem library for high performance computing
Optimizing Parallel Multiplication Operation for Rectangular and Transposed Matrices
ICPADS '04 Proceedings of the Parallel and Distributed Systems, Tenth International Conference
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Exploiting processor groups to extend scalability of the GA shared memory programming model
Proceedings of the 2nd conference on Computing frontiers
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Using the GA and TAO toolkits for solving large-scale optimization problems on parallel computers
ACM Transactions on Mathematical Software (TOMS)
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 4 - Volume 05
Exploiting processor groups to extend scalability of the GA shared memory programming model
Proceedings of the 2nd conference on Computing frontiers
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Topology-aware tile mapping for clusters of SMPs
Proceedings of the 3rd conference on Computing frontiers
IEEE Transactions on Parallel and Distributed Systems
Enabling rapid development of parallel tree search applications
Proceedings of the 5th IEEE workshop on Challenges of large applications in distributed environments
Future Generation Computer Systems
Latency-Optimized Parallelization of the FMM Near-Field Computations
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
Integrated Data and Task Management for Scientific Applications
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Exploiting the Locality Properties of Peano Curves for Parallel Matrix Multiplication
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Development of high performance scientific components for interoperability of computing packages
SpringSim '09 Proceedings of the 2009 Spring Simulation Multiconference
International Journal of High Performance Computing Applications
Hybrid parallel programming with MPI and unified parallel C
Proceedings of the 7th ACM international conference on Computing frontiers
Enabling a highly-scalable global address space model for petascale computing
Proceedings of the 7th ACM international conference on Computing frontiers
A global address space framework for irregular applications
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
An extensible global address space framework with decoupled task and data abstractions
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Tolerating correlated failures for generalized Cartesian distributions via bipartite matching
Proceedings of the 8th ACM International Conference on Computing Frontiers
Application-specific fault tolerance via data access characterization
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Noncollective communicator creation in MPI
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Leveraging C++ meta-programming capabilities to simplify the message passing programming model
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
The Combinatorial BLAS: design, implementation, and applications
International Journal of High Performance Computing Applications
Data and computation abstractions for dynamic and irregular computations
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Journal of Parallel and Distributed Computing
Poster: High-level, one-sided programming models on MPI: a case study with global arrays and NWChem
Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
Poster: automatic parallelization of numerical python applications using the global arrays toolkit
Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
Proceedings of the 9th conference on Computing Frontiers
Enhancing the performance of assisted execution runtime systems through hardware/software techniques
Proceedings of the 26th ACM international conference on Supercomputing
Performance characterization of global address space applications: a case study with NWChem
Concurrency and Computation: Practice & Experience
Global Futures: A Multithreaded Execution Model for Global Arrays-based Applications
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
The Red Storm Architecture and Early Experiences with Multi-Core Processors
International Journal of Distributed Systems and Technologies
A framework for load balancing of tensor contraction expressions via dynamic task partitioning
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A system framework and API for run-time adaptable parallel software
Proceedings of the 2013 Research in Adaptive and Convergent Systems
X10-FT: Transparent fault tolerance for APGAS language and runtime
Parallel Computing
Hi-index | 0.00 |
This paper describes capabilities, evolution, performance, and applications of the Global Arrays (GA) toolkit. GA was created to provide application programmers with an inteface that allows them to distribute data while maintaining the type of global index space and programming syntax similar to that available when programming on a single processor. The goal of GA is to free the programmer from the low level management of communication and allow them to deal with their problems at the level at which they were originally formulated. At the same time, compatibility of GA with MPI enables the programmer to take advatage of the existing MPI software/libraries when available and appropriate. The variety of applications that have been implemented using Global Arrays attests to the attractiveness of using higher level abstractions to write parallel code.