Updating distributed variables in local computations
Concurrency: Practice and Experience
Compiler optimizations for Fortran D on MIMD distributed-memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Proceedings of the tenth annual conference on Object-oriented programming systems, languages, and applications
Interprocedural data flow based optimizations for distributed memory compilation
Software—Practice & Experience
The implementation and evaluation of fusion and contraction in array languages
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Loop fusion in high performance Fortran
ICS '98 Proceedings of the 12th international conference on Supercomputing
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
Scientific and Engineering C++: An Introduction with Advanced Techniques and Examples
Scientific and Engineering C++: An Introduction with Advanced Techniques and Examples
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
The Matrix Template Library: Generic Components for High-Performance Scientific Computing
Computing in Science and Engineering
Collective Loop Fusion for Array Contraction
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
An Evaluation of Data-Parallel Compiler Support for Line-Sweep Applications
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Delayed Evaluation, Self-optimising Software Components as a Programming Model
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Array Design and Expression Evaluation in POOMA II
ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
A Generalized Framework for Global Communication Optimization
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
HICSS '97 Proceedings of the 30th Hawaii International Conference on System Sciences: Software Technology and Architecture - Volume 1
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Programming for parallelism and locality with hierarchically tiled arrays
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
QUAFF: efficient C++ design for parallel skeletons
Parallel Computing - Algorithmic skeletons
IEEE Transactions on Software Engineering
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Exploiting locality and parallelism with hierarchically tiled arrays
Exploiting locality and parallelism with hierarchically tiled arrays
A Case Study of Some Issues in the Optimization of Fortran 90 Array Notation
Scientific Programming
Intel threading building blocks
Intel threading building blocks
Writing productive stencil codes with overlapped tiling
Concurrency and Computation: Practice & Experience - Compilers for Parallel Computers 2007 Workshop (CPC 2007)
Design and use of htalib: a library for hierarchically tiled arrays
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
A parallel numerical solver using hierarchically tiled arrays
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Exploiting heterogeneous parallelism with the Heterogeneous Programming Library
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Object oriented languages can be easily extended with new data types, which facilitate prototyping new language extensions. A very challenging problem is the development of data types encapsulating data parallel operations, which could improve parallel programming productivity. However, the use of class libraries to implement data types, particularly when they encapsulate parallelism, comes at the expense of performance overhead. This paper describes our experience with the implementation of a C++ data type called hierarchically tiled array (HTA). This object includes data parallel operations and allows the manipulation of tiles to facilitate developing efficient parallel codes and codes with high degree of locality. The initial performance of the HTA programs we wrote was lower than that of their conventional MPI-based counterparts. The overhead was due to factors such as the creation of temporary HTAs and the inability of the compiler to properly inline index computations, among others. We describe the performance problems and the optimizations applied to overcome them as well as their impact on programmability. After the optimization process, our HTA-based implementations run only slightly slower than the MPI-based codes while having much better programmability metrics.