Improving locality and parallelism in nested loops

Authors:
Michael Edward Wolf
Affiliations:
-
Venue:
Improving locality and parallelism in nested loops
Year:
1992

Citing 0
Cited 55

Global optimizations for parallelism and locality on scalable parallel machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Communication optimization and code generation for distributed memory machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors

ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers

ICS '95 Proceedings of the 9th international conference on Supercomputing
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
A quantitative analysis of loop nest locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Combining loop transformations considering caches and scheduling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Fusion of Loops for Parallelism and Locality

IEEE Transactions on Parallel and Distributed Systems
Maximizing parallelism and minimizing synchronization with affine transforms

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A Compiler Optimization Algorithm for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
New tiling techniques to improve cache temporal locality

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
An affine partitioning algorithm to maximize parallelism and minimize communication

ICS '99 Proceedings of the 13th international conference on Supercomputing
A tile selection algorithm for data locality and cache interference

ICS '99 Proceedings of the 13th international conference on Supercomputing
Compiler Analysis for Cache Coherence: Interprocedural Array Data-Flow Analysis and Its Impact on Cache Performance

IEEE Transactions on Parallel and Distributed Systems
A preprocessing step for global loop transformations for data transfer optimization

CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Data locality enhancement by memory reduction

ICS '01 Proceedings of the 15th international conference on Supercomputing
Blocking and array contraction across arbitrarily nested loops using affine partitioning

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Register tiling in nonrectangular iteration spaces

ACM Transactions on Programming Languages and Systems (TOPLAS)
An I/O-Conscious Tiling Strategy for Disk-Resident Data Sets

The Journal of Supercomputing
Compilation of Vector Statements of C[] Language for Architectures with Multilevel Memory Hierarchy

Programming and Computing Software
Enabling unimodular transformations

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Combining Loop Fusion with Prefetching on Shared-memory Multiprocessors

ICPP '97 Proceedings of the international Conference on Parallel Processing
Eliminating Stale Data References through Array Data-Flow Analysis

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
An Integrated Framework for Compiler-Directed Cache Coherence and Data Prefetching

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Tiling and Memory Reuse for Sequences of Nested Loops

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Transformations on Doubly Nested Loops

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Locality Enhancement for Large-Scale Shared-Memory Multiprocessors

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
On the Parallel Execution Time of Tiled Loops

IEEE Transactions on Parallel and Distributed Systems
QR factorization for shared memory and message passing

Parallel Computing
Partitioning Loops with Variable Dependence Distances

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Automatic tiling of iterative stencil loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy

Proceedings of the international symposium on Code generation and optimization
Exploiting Inter-Processor Data Sharing for Improving Behavior of Multi-Processor SoCs

ISVLSI '05 Proceedings of the IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design
Interprocedural parallelization analysis in SUIF

ACM Transactions on Programming Languages and Systems (TOPLAS)
A polynomial-time algorithm for memory space reduction

International Journal of Parallel Programming
Facilitating the search for compositions of program transformations

Proceedings of the 19th annual international conference on Supercomputing
In search of a program generator to implement generic transformations for high-performance computing

Science of Computer Programming - Special issue on the first MetaOCaml workshop 2004
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
Implicit and explicit optimizations for stencil computations

Proceedings of the 2006 workshop on Memory system performance and correctness
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time

Proceedings of the International Symposium on Code Generation and Optimization
Buffer and Register Allocation for Memory Space Optimization

Journal of VLSI Signal Processing Systems
Finding Synchronization-Free Slices of Operations in Arbitrarily Nested Loops

ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Extracting synchronization-free slices of operations in perfectly-nested loops

PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Finding synchronization-free parallelism for non-uniform loops

ICCS'03 Proceedings of the 2003 international conference on Computational science: PartII
Finding coarse grained parallelism in computational geometry algorithms

ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartIII
Strength reduction of integer division and modulo operations

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Adaptive prefetching for shared cache based chip multiprocessors

Proceedings of the Conference on Design, Automation and Test in Europe
Runtime biased pointer reuse analysis and its application to energy efficiency

PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
A software approach for combating asymmetries of non-volatile memories

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
C1C: A configurable, compiler-guided STT-RAM L1 cache

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Improving locality and parallelism in nested loops

Quantified Score

Visualization

Abstract