Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
Computer
Computer
Portable programs for parallel processors
Portable programs for parallel processors
Run-time partitioning of scientific continuum calculations running on multiprocessors
Run-time partitioning of scientific continuum calculations running on multiprocessors
Run-time parallelization and scheduling of loops
SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Run-Time Parallelization and Scheduling of Loops
IEEE Transactions on Computers
Compiler optimizations for Fortran D on MIMD distributed-memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
Access normalization: loop restructuring for NUMA compilers
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Heterogeneous parallel programming in Jade
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Generating local addresses and communication sets for data-parallel programs
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Loop transformations for NUMA machines
ACM SIGPLAN Notices - Workshop on languages, compilers and run-time environments for distributed memory multiprocessors
Access normalization: loop restructuring for NUMA computers
ACM Transactions on Computer Systems (TOCS)
Runtime compilation techniques for data partitioning and communication schedule reuse
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Data and program restructuring of irregular applications for cache-coherent multiprocessor
ICS '94 Proceedings of the 8th international conference on Supercomputing
Static and Run-Time Algorithms for All-to-Many Personalized Communication on Permutation Networks
IEEE Transactions on Parallel and Distributed Systems
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Index array flattening through program transformation
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Efficient resolution of sparse indirections in data-parallel compilers
ICS '95 Proceedings of the 9th international conference on Supercomputing
Portable run-time support for dynamic object-oriented parallel processing
ACM Transactions on Computer Systems (TOCS)
Experimental evaluation of efficient sparse matrix distributions
ICS '96 Proceedings of the 10th international conference on Supercomputing
An efficient uniform run-time scheme for mixed regular-irregular applications
ICS '98 Proceedings of the 12th international conference on Supercomputing
Static Single Assignment Form for Message-Passing Programs
International Journal of Parallel Programming
Runtime and compiler support for irregular computations
Compiler optimizations for scalable parallel systems
Run-time and compile-time support for adaptive irregular problems
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Scheduling of unstructured communication on the Intel iPSC/860
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Parallelizing Molecular Dynamics Programs for Distributed-Memory Machines
IEEE Computational Science & Engineering
Distributed Memory Compiler Design For Sparse Problems
IEEE Transactions on Computers
Compiling Global Name-Space Parallel Loops for Distributed Execution
IEEE Transactions on Parallel and Distributed Systems
Runtime Support and Compilation Methods for User-Specified Irregular Data Distributions
IEEE Transactions on Parallel and Distributed Systems
The design and implementation of a parallel array operator for the arbitrary remapping of data
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Metrics and models for reordering transformations
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Sparse Tiling for Stationary Iterative Methods
International Journal of High Performance Computing Applications
Optimizing irregular shared-memory applications for clusters
Proceedings of the 22nd annual international conference on Supercomputing
Region-based parallelization of irregular reductions on explicitly managed memory hierarchies
The Journal of Supercomputing
Non-affine Extensions to Polyhedral Code Generation
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.02 |
There exists substantial data level parallelism in scientific problems. The PARTY runtime system is an attempt to obtain efficient parallel implementations for scientific computations, particularly those where the data dependencies are manifest only at runtime. This can preclude compiler based detection of certain types of parallelism. The automated system is structured as follows: An appropriate level of granularity is first selected for the computations. A directed acyclic graph representation of the program is generated on which various aggregation techniques may be employed in order to generate efficient schedules. These schedules are then mapped onto the target machine. We describe some initial results from experiments conducted on the Intel Hypercube and the Encore Multimax that indicate the usefulness of our approach.