Collective Loop Fusion for Array Contraction

Authors:
Guang R. Gao;R. Olsen;Vivek Sarkar;Radhika Thekkath
Affiliations:
-;-;-;-
Venue:
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Year:
1992

Citing 0
Cited 55

Automatic partitioning of a program dependence graph into parallel tasks

IBM Journal of Research and Development
Global optimizations for parallelism and locality on scalable parallel machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Communication optimization and code generation for distributed memory machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Optimal weighted loop fusion for parallel programs

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
The implementation and evaluation of fusion and contraction in array languages

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Loop fusion in high performance Fortran

ICS '98 Proceedings of the 12th international conference on Supercomputing
SMARTS: exploiting temporal locality and parallelism through vertical execution

ICS '99 Proceedings of the 13th international conference on Supercomputing
Locality optimizations for multi-level caches

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Fast greedy weighted fusion

Proceedings of the 14th international conference on Supercomputing
Data locality enhancement by memory reduction

ICS '01 Proceedings of the 15th international conference on Supercomputing
Loop optimization for a class of memory-constrained computations

ICS '01 Proceedings of the 15th international conference on Supercomputing
Blocking and array contraction across arbitrarily nested loops using affine partitioning

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Loop fusion for memory space optimization

Proceedings of the 14th international symposium on Systems synthesis
Space-time trade-off optimization for a class of electronic structure calculations

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Fast Greedy Weighted Fusion

International Journal of Parallel Programming
Loop Restructuring for Data I/O Minimization on Limited On-Chip Memory Embedded Processors

IEEE Transactions on Computers
Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Data I/O Minimization for Loops on Limited Onchip Memory Processors

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Optimization of Memory Usage Requirement for a Class of Loops Implementing Multi-dimensional Integrals

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
A Framework for Loop Distribution on Limited On-Chip Memory Processors

CC '00 Proceedings of the 9th International Conference on Compiler Construction
Single Assignment C: efficient support for high-level array operations in a functional setting

Journal of Functional Programming
Improving effective bandwidth through compiler enhancement of global cache reuse

Journal of Parallel and Distributed Computing
Improving Data Locality by Array Contraction

IEEE Transactions on Computers
The Energy Impact of Aggressive Loop Fusion

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
The Potential of Computation Regrouping for Improving Locality

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
New Complexity Results on Array Contraction and Related Problems

Journal of VLSI Signal Processing Systems
A case for a working-set-based memory hierarchy

Proceedings of the 2nd conference on Computing frontiers
Fast and efficient searches for effective optimization-phase sequences

ACM Transactions on Architecture and Code Optimization (TACO)
Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion

International Journal of High Performance Computing Applications
A polynomial-time algorithm for memory space reduction

International Journal of Parallel Programming
Integrated Loop Optimizations for Data Locality Enhancement of Tensor Contraction Expressions

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Profitable loop fusion and tiling using model-driven empirical search

Proceedings of the 20th annual international conference on Supercomputing
On minimizing materializations of array-valued temporaries

ACM Transactions on Programming Languages and Systems (TOPLAS)
Code-size conscious pipelining of imperfectly nested loops

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Improving the parallelism of iterative methods by aggressive loop fusion

The Journal of Supercomputing
Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSP

Journal of Parallel and Distributed Computing
A tuning framework for software-managed memory hierarchies

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Efficient Map Portrayal Using a General-Purpose Query Language

DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
New algorithms for SIMD alignment

CC'07 Proceedings of the 16th international conference on Compiler construction
Locality enhancement by array contraction

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Model-guided empirical tuning of loop fusion

International Journal of High Performance Systems Architecture
Memory minimization for tensor contractions using integer linear programming

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
DESOLA: An active linear algebra library using delayed evaluation and runtime code generation

Science of Computer Programming
Parallel memory prediction for fused linear algebra kernels

ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Accelerating computationally intensive queries on massive earth science data: (system demonstration)

Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases
A cache-conscious profitability model for empirical tuning of loop fusion

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Efficient search-space pruning for integrated fusion and tiling transformations

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Generalized index-set splitting

CC'05 Proceedings of the 14th international conference on Compiler Construction
Efficient execution of time-step computations with pipelined parallelism and inter-thread data locality optimizaitions

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Memory-constrained communication minimization for a class of array computations

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Removing impediments to loop fusion through code transformations

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Iterative collective loop fusion

CC'06 Proceedings of the 15th international conference on Compiler Construction
Optimization techniques for efficient HTA programs

Parallel Computing

Quantified Score

Hi-index	0.01

Collective Loop Fusion for Array Contraction

Quantified Score

Visualization

Abstract