Programming with tiles

Authors:
Jia Guo;Ganesh Bikshandi;Basilio B. Fraguela;Maria J. Garzaran;David Padua
Affiliations:
UIUC, Urbana, IL, USA;IBM, India, Bangalore, India;Universidade da Coruna,Spain, LA CORUNA, Spain;UIUC, Urbana, IL, USA;UIUC, Urbana, IL, USA
Venue:
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Year:
2008

Citing 28
Cited 14

Solving problems on concurrent processors. Vol. 1: General techniques and regular problems

Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Compiler optimizations for Fortran D on MIMD distributed-memory machines

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Tiling multidimensional iteration spaces for nonshared memory machines

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
Locality of Reference in LU Decomposition with Partial Pivoting

SIAM Journal on Matrix Analysis and Applications
Co-array Fortran for parallel programming

ACM SIGPLAN Fortran Forum
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
Using MPI (2nd ed.): portable parallel programming with the message-passing interface

Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Parallel programming in OpenMP

Parallel programming in OpenMP
Organizing matrices and matrix operations for paged memory systems

Communications of the ACM
Blocking and array contraction across arbitrarily nested loops using affine partitioning

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Elements of Software Science (Operating and programming systems series)

Elements of Software Science (Operating and programming systems series)
Formal derivation of algorithms: The triangular sylvester equation

ACM Transactions on Mathematical Software (TOMS)
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
High-performance linear algebra algorithms using new generalized data structures for matrices

IBM Journal of Research and Development
The science of deriving dense linear algebra algorithms

ACM Transactions on Mathematical Software (TOMS)
Representing linear algebra algorithms in code: the FLAME application program interfaces

ACM Transactions on Mathematical Software (TOMS)
Programming for parallelism and locality with hierarchically tiled arrays

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Shared memory programming for large scale machines

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations

IEEE Transactions on Computers
A Complexity Measure

IEEE Transactions on Software Engineering
Intel threading building blocks

Intel threading building blocks
Design and use of htalib: a library for hierarchically tiled arrays

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing

Design Issues in Parallel Array Languages for Shared Memory

SAMOS '08 Proceedings of the 8th international workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
CUDA-Lite: Reducing GPU Programming Complexity

Languages and Compilers for Parallel Computing
Solving dense linear systems on platforms with multiple hardware accelerators

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimization of tele-immersion codes

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Programming matrix algorithms-by-blocks for thread-level parallelism

ACM Transactions on Mathematical Software (TOMS)
Tile Reduction: The First Step towards Tile Aware Parallelization in OpenMP

IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
JCudaMP: OpenMP/Java on CUDA

Proceedings of the 3rd International Workshop on Multicore Software Engineering
Managing the complexity of lookahead for LU factorization with pivoting

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
New abstractions for data parallel programming

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
A Block-Oriented Language and Runtime System for Tensor Algebra with Very Large Arrays

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
A parallel numerical solver using hierarchically tiled arrays

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Optimization techniques for efficient HTA programs

Parallel Computing
A C++ library for rapid development of efficient parallel dense linear algebra codes for multicore computers

Proceedings of the 51st ACM Southeast Conference
A Basic Linear Algebra Compiler

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

The importance of tiles or blocks in scientific computing cannot be overstated. Many algorithms, both iterative and recursive, can be expressed naturally if tiles are represented explicitly. From the point of view of performance, tiling, either as a code or a data layout transformation, is one of the most effective ways to exploit locality, which is a must to achieve good performance in current computers because of the significant difference in speed between processor and memory. Furthermore, tiles are also useful to express data distribution in parallel computations. However, despite the importance of tiles, most languages do not support them directly. This gives place to bloated programs populated with numerous subscript expressions which make the code difficult to read and coding mistakes more likely. This paper discusses Hierarchically Tiled Arrays (HTAs), a data type which facilitates the easy manipulation of tiles in object-oriented languages with emphasis on two new features, dynamic partitioning and overlapped tiling. These features facilitate the expression of locality and communication while maintaining the same performance of algorithms written using conventional languages.