Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution

Authors:
Ken Kennedy;Kathryn S. McKinley
Affiliations:
-;-
Venue:
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Year:
1993

Citing 0
Cited 72

Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors

ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
SPAID: software prefetching in pointer- and call-intensive environments

Proceedings of the 28th annual international symposium on Microarchitecture
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Fusion of Loops for Parallelism and Locality

IEEE Transactions on Parallel and Distributed Systems
Optimal weighted loop fusion for parallel programs

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Maximizing parallelism and minimizing synchronization with affine transforms

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Tuning compiler optimizations for simultaneous multithreading

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
A Compiler Optimization Algorithm for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
SMARTS: exploiting temporal locality and parallelism through vertical execution

ICS '99 Proceedings of the 13th international conference on Supercomputing
Locality optimizations for multi-level caches

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Fast greedy weighted fusion

Proceedings of the 14th international conference on Supercomputing
Tuning Compiler Optimizations for Simultaneous Multithreading

International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
Data locality enhancement by memory reduction

ICS '01 Proceedings of the 15th international conference on Supercomputing
Loop optimization for a class of memory-constrained computations

ICS '01 Proceedings of the 15th international conference on Supercomputing
Blocking and array contraction across arbitrarily nested loops using affine partitioning

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Loop fusion for memory space optimization

Proceedings of the 14th international symposium on Systems synthesis
Combined partitioning and data padding for scheduling multiple loop nests

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Functional array fusion

Proceedings of the sixth ACM SIGPLAN international conference on Functional programming
Space-time trade-off optimization for a class of electronic structure calculations

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Loop fusion for clustered VLIW architectures

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Fast Greedy Weighted Fusion

International Journal of Parallel Programming
Quantifying the Multi-Level Nature of Tiling Interactions

International Journal of Parallel Programming
Loop Restructuring for Data I/O Minimization on Limited On-Chip Memory Embedded Processors

IEEE Transactions on Computers
Data I/O Minimization for Loops on Limited Onchip Memory Processors

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Optimization of Memory Usage Requirement for a Class of Loops Implementing Multi-dimensional Integrals

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Complexity of Multi-dimensional Loop Alignment

STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
A Framework for Loop Distribution on Limited On-Chip Memory Processors

CC '00 Proceedings of the 9th International Conference on Compiler Construction
On the Parallel Execution Time of Tiled Loops

IEEE Transactions on Parallel and Distributed Systems
Automatic data mapping of signal processing applications

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Improving Data Locality by Array Contraction

IEEE Transactions on Computers
General loop fusion technique for nested loops considering timing and code size

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Automatic tiling of iterative stencil loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Identifying and Exploiting Spatial Regularity in Data Memory References

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
New Complexity Results on Array Contraction and Related Problems

Journal of VLSI Signal Processing Systems
A case for a working-set-based memory hierarchy

Proceedings of the 2nd conference on Computing frontiers
Formal loop merging for signal transforms

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Generating cache hints for improved program efficiency

Journal of Systems Architecture: the EUROMICRO Journal
A polynomial-time algorithm for memory space reduction

International Journal of Parallel Programming
Deep Jam: Conversion of Coarse-Grain Parallelism to Instruction-Level and Vector Parallelism for Irregular Applications

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Integrated Loop Optimizations for Data Locality Enhancement of Tensor Contraction Expressions

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
2D data locality: definition, abstraction, and application

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
On minimizing materializations of array-valued temporaries

ACM Transactions on Programming Languages and Systems (TOPLAS)
Expressive power of an algebra for data mining

ACM Transactions on Database Systems (TODS)
Data parallel Haskell: a status report

Proceedings of the 2007 workshop on Declarative aspects of multicore programming
MPSoC memory optimization using program transformation

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSP

Journal of Parallel and Distributed Computing
Reducing memory requirements of resource-constrained applications

ACM Transactions on Embedded Computing Systems (TECS)
New algorithms for SIMD alignment

CC'07 Proceedings of the 16th international conference on Compiler construction
Loop transformations for reducing data space requirements of resource-constrained applications

SAS'03 Proceedings of the 10th international conference on Static analysis
Locality enhancement by array contraction

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Model-guided empirical tuning of loop fusion

International Journal of High Performance Systems Architecture
A model for fusion and code motion in an automatic parallelizing compiler

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Memory minimization for tensor contractions using integer linear programming

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Compiler-directed memory management for heterogeneous MPSoCs

Journal of Systems Architecture: the EUROMICRO Journal
Loop transformations: convexity, pruning and optimization

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Loop Distribution and Fusion with Timing and Code Size Optimization

Journal of Signal Processing Systems
Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Optimizing integrated application performance with cache-aware metascheduling

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part II
A cache-conscious profitability model for empirical tuning of loop fusion

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Efficient search-space pruning for integrated fusion and tiling transformations

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Loop distribution and fusion with timing and code size optimization for embedded DSPs

EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
MiniTasking: improving cache performance for multiple query workloads

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Generalized index-set splitting

CC'05 Proceedings of the 14th international conference on Compiler Construction
Removing impediments to loop fusion through code transformations

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Loop fusion and reordering for register file optimization on stream processors

Journal of Systems and Software
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Revisiting loop fusion in the polyhedral framework

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Triolet: a programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.01

Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution

Quantified Score

Visualization

Abstract