The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Precise miss analysis for program transformations with caches of arbitrary associativity
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
New tiling techniques to improve cache temporal locality
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
A tile selection algorithm for data locality and cache interference
ICS '99 Proceedings of the 13th international conference on Supercomputing
Quantifying the multi-level nature of tiling interactions
International Journal of Parallel Programming
Proceedings of the 14th international conference on Supercomputing
Synthesizing transformations for locality enhancement of imperfectly-nested loop nests
Proceedings of the 14th international conference on Supercomputing
Tiling imperfectly-nested loop nests
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Tiling optimizations for 3D scientific computations
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Blocking and array contraction across arbitrarily nested loops using affine partitioning
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Space-time trade-off optimization for a class of electronic structure calculations
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Increasing temporal locality with skewing and recursive blocking
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Collective Loop Fusion for Array Contraction
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Better tiling and array contraction for compiling scientific programs
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A high-level approach to synthesis of high-performance codes for quantum chemistry
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Global Communication Optimization for Tensor Contraction Expressions under Memory Constraints
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Performance optimization of a class of loops implementing multidimensional integrals
Performance optimization of a class of loops implementing multidimensional integrals
Improving effective bandwidth through compiler enhancement of global cache reuse
Journal of Parallel and Distributed Computing
Finding effective compilation sequences
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Hypergraph partitioning for automatic memory hierarchy management
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Practical loop transformations for tensor contraction expressions on multi-level memory hierarchies
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Efficient search-space pruning for integrated fusion and tiling transformations
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
A very challenging issue for optimizing compilers is the phase ordering problem: In what order should a collection of compiler optimizations be performed? We address this problem in the context of optimizing a sequence of tensor contractions. The pertinent loop transformations are loop permutation, tiling, and fusion; in addition, the placement of disk I/O statements crucially affects performance. The space of possible combinations is exponentially large. We develop novel pruning strategies whereby a search problem in a larger space is replaced by a large number of searches in a much smaller space, to determine the optimal permutation, fusion, tiling and placement of disk I/O statements. Experimental results show that we obtain an improvement in I/O cost by a factor of up to 2.6 over an equi-tile-size approach.