Run-time scheduling and execution of loops on message passing machines
Journal of Parallel and Distributed Computing - Special issue: algorithms for hypercube computers
The high performance Fortran handbook
The high performance Fortran handbook
Context-sensitive interprocedural points-to analysis in the presence of function pointers
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Efficient support for irregular applications on distributed-memory machines
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Interprocedural compilation of irregular applications for distributed memory machines
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Index array flattening through program transformation
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A study of the EARTH-MANNA multithreaded system
International Journal of Parallel Programming - Special issue on parallel architectures and compilation techniques—part II
Interprocedural data flow based optimizations for distributed memory compilation
Software—Practice & Experience
Compiling C for the EARTH multithreaded architecture
International Journal of Parallel Programming - Special issue: selected papers from PACT'96, fourth international conference on parallel architectures and compilation techniques—part 1
Communication optimizations for parallel C programs
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving memory hierarchy performance for irregular applications
ICS '99 Proceedings of the 13th international conference on Supercomputing
Proceedings of the 14th international conference on Supercomputing
Automatic compiler techniques for thread coarsening for multithreaded architectures
Proceedings of the 14th international conference on Supercomputing
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Parallelizing Molecular Dynamics Programs for Distributed-Memory Machines
IEEE Computational Science & Engineering
Distributed Memory Compiler Design For Sparse Problems
IEEE Transactions on Computers
Compiling Global Name-Space Parallel Loops for Distributed Execution
IEEE Transactions on Parallel and Distributed Systems
Exploiting spatial regularity in irregular iterative applications
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
On the Automatic Parallelization of Sparse and Irregular Fortran Programs
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
A Comparison of Locality Transformations for Irregular Codes
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Heap Analysis And Optimizations For Threaded Programs
PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Compiling Several Classes of Communication Patterns on a Multithreaded Architecture
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
An Adaptive Algorithm Selection Framework for Reduction Parallelization
IEEE Transactions on Parallel and Distributed Systems
An analytical model of locality-based parallel irregular reductions
Parallel Computing
Hi-index | 0.00 |
Computations from many scientific and engineering domains use irregular meshes and/or sparse matrices. The codes expressing these computations involve irregular reductions. Irregular reductions pose many challenges to parallel architectures and their compilers in terms of parallelization, locality management, and communication optimization.Multithreaded architectures offer rich support for local synchronization, overlapping of communication and computation, and low-overhead communication and thread switching. Therefore, they appear to be promising for scalable parallelization of irregular reductions. This paper presents an execution model and a compilation strategy for supporting irregular reductions on a fme-grained multithreaded architecture. The key aspect of this strategy is that the frequency and volume of communication is independent of the contents of the indirection arrays. The performance obtained depends upon the architecture's ability to overlap communication and computation and is largely independent of the partitioning of the problem.We present experimental results from compiling three scientific kernels involving irregular reductions (mvm, euler, and moldyn) for execution on the EARTH fine-grained multithreaded architecture. On mvm, which does not involve any left-hand-side irregular accesses, we achieve near linear absolute speedups. For euler and moldyn, which do involve left-hand-side irregular accesses, our strategy initially incurs some overheads, but the relative speedups are very good. In going from 2 to 32 processors, the relative speedups for euler were 9.28 and 10.36 on its two datasets, while the speedups for moldyn were 9.70 and 10.76 on its two datasets.