A Partitioning Strategy for Nonuniform Problems on Multiprocessors
IEEE Transactions on Computers
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
Vienna-Fortran/HPF Extensions for Sparse and Irregular Problems and Their Compilation
IEEE Transactions on Parallel and Distributed Systems
High-level management of communication schedules in HPF-like languages
ICS '98 Proceedings of the 12th international conference on Supercomputing
High performance Fortran: history, status and future
Parallel Computing - Special issues on languages and compilers for parallel computers
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
A fast Fourier transform compiler
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Compiling high performance Fortran for distributed-memory architectures
Parallel Computing - Special Anniversary issue
A performance analysis of the Berkeley UPC compiler
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy
Proceedings of the international symposium on Code generation and optimization
Facilitating the search for compositions of program transformations
Proceedings of the 19th annual international conference on Supercomputing
Processing-in-memory technology for knowledge discovery algorithms
DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
The rise and fall of High Performance Fortran: an historical object lesson
Proceedings of the third ACM SIGPLAN conference on History of programming languages
Parallel Programmability and the Chapel Language
International Journal of High Performance Computing Applications
Model-guided empirical optimization for memory hierarchy
Model-guided empirical optimization for memory hierarchy
Introduction to the cell broadband engine architecture
IBM Journal of Research and Development
From FORTRAN 77 to locality-aware high productivity languages for peta-scale computing
Scientific Programming - Fortran Programming Language and Scientific Programming: 50 Years of Mutual Growth
Scientific Programming
A scalable auto-tuning framework for compiler optimization
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A systematic approach to model-guided empirical search for memory hierarchy optimization
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
A programming language interface to describe transformations and code generation
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Probabilistic auto-tuning for architectures with complex constraints
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Hi-index | 0.01 |
addresses the enormous complexity of mapping applications to current and future highly parallel platforms - including scalable architectures consisting of tens of thousands of nodes, many-core devices with tens to hundreds of cores, and hierarchical systems providing multi-level parallelism. At systems of these scales, for many important algorithms, performance is dominated by the time required to move data across the levels of the memory hierarchy. As a consequence, locality awareness of algorithms and the efficient management of communication are essential requirements for obtaining scalable parallel performance, and are of particular concern for applications characterized by irregular memory access patterns. We describe the design of a programming system that focuses on productivity of application programmers in expressing locality-aware algorithms for high-end architectures, which are then automatically tuned for performance. The approach combines the successes of two novel concepts for managing locality: high-level specification of user-defined data distributions and model-guided autotuning for data locality. The resulting combined system provides a powerful general mechanism for the specification of data distributions, which can express domain-specific knowledge, and facilitates automatic tuning of a distribution to access patterns in algorithms and its application to different levels of a memory hierarchy. Because there is a clean separation between the specification of a data distribution and the algorithms in which it is used, these can be written separately and composed together to quickly develop new applications that can be tuned in the context of their data set and execution environment. We address key issues for a range of codes that include LU Decomposition, Sparse Matrix-Vector Multiply and Knowledge Discovery. The knowledge discovery algorithms, in particular, stress the proposed language and compiler technology and provide a forcing function for developing tools that address inherent challenges of irregular applications.}