A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
An overview of High Performance Fortran
ACM SIGPLAN Fortran Forum
Vienna Fortran—a Fortran language extension for distributed memory multiprocessors
Languages, compilers and run-time environments for distributed memory machines
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
The role of performance models in parallel programming and languages
The role of performance models in parallel programming and languages
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Global arrays: a portable "shared-memory" programming model for distributed memory computers
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
The Case for High-Level Parallel Programming in ZPL
IEEE Computational Science & Engineering
MultiMATLAB: MATLAB on Multiple Processors
MultiMATLAB: MATLAB on Multiple Processors
SUMMA: Scalable Universal Matrix Multiplication Algorithm
SUMMA: Scalable Universal Matrix Multiplication Algorithm
A cellular computer to implement the kalman filter algorithm
A cellular computer to implement the kalman filter algorithm
HICSS '97 Proceedings of the 30th Hawaii International Conference on System Sciences: Software Technology and Architecture - Volume 1
High Performance Fortran: Language Specification (PART II)
ACM SIGPLAN Fortran Forum - Special issue: high performance Fortran language specification, part 2
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
pMatlab Parallel Matlab Library
International Journal of High Performance Computing Applications
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Multi-level tiling: M for the price of one
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Design Issues in Parallel Array Languages for Shared Memory
SAMOS '08 Proceedings of the 8th international workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Multidimensional Blocking in UPC
Languages and Compilers for Parallel Computing
Parallelization spectroscopy: analysis of thread-level parallelism in hpc programs
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimization of tele-immersion codes
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
A compositional framework for developing parallel programs on two-dimensional arrays
International Journal of Parallel Programming
Automating the generation of composed linear algebra kernels
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Optimizing shared cache behavior of chip multiprocessors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Improving parallelism and locality with asynchronous algorithms
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Design and use of htalib: a library for hierarchically tiled arrays
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Partitioning streaming parallelism for multi-cores: a machine learning based approach
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
New abstractions for data parallel programming
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
User-defined distributions and layouts in chapel: philosophy and framework
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Compiler-directed memory management for heterogeneous MPSoCs
Journal of Systems Architecture: the EUROMICRO Journal
Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
SpiceC: scalable parallelism via implicit copying and explicit commit
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
A parallel numerical solver using hierarchically tiled arrays
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Unified parallel C for GPU clusters: language extensions and compiler implementation
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Trasgo: a nested-parallel programming system
The Journal of Supercomputing
PARRAY: a unifying array representation for heterogeneous parallelism
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Hierarchical place trees: a portable abstraction for task parallelism and data movement
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Parallel programming: design of an overview class
Proceedings of the 2011 ACM SIGPLAN X10 Workshop
Matching memory access patterns and data placement for NUMA systems
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Optimization techniques for efficient HTA programs
Parallel Computing
The Journal of Supercomputing
Extending a hierarchical tiling arrays library to support sparse data partitioning
The Journal of Supercomputing
Algebraic program semantics for supercomputing
Theories of Programming and Formal Methods
Hi-index | 0.00 |
Tiling has proven to be an effective mechanism to develop high performance implementations of algorithms. Tiling can be used to organize computations so that communication costs in parallel programs are reduced and locality in sequential codes or sequential components of parallel programs is enhanced.In this paper, a data type - Hierarchically Tiled Arrays or HTAs - that facilitates the direct manipulation of tiles is introduced. HTA operations are overloaded array operations. We argue that the implementation of HTAs in sequential OO languages transforms these languages into powerful tools for the development of high-performance parallel codes and codes with high degree of locality. To support this claim, we discuss our experiences with the implementation of HTAs for MATLAB and C++ and the rewriting of the NAS benchmarks and a few other programs into HTA-based parallel form.