Strategies for cache and local memory management by global program transformation

Authors:
Dennis Gannon;William Jalby;Kyle Gallivan
Affiliations:
-;-;-
Venue:
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Year:
1988

Citing 0
Cited 138

Introducing symbolic problem solving techniques in the dependence testing phases of a vectorizer

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Behavioral characterization of multiprocessor memory systems: a case study

SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A methodology for parallelizing programs for multicomputers and complex memory multiprocessors

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Experimentally Characterizing the Behavior of Multiprocessor Memory Systems: A Case Study

IEEE Transactions on Software Engineering
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Beyond loop partitioning: data assignment and overlap to reduce communication overhead

ICS '91 Proceedings of the 5th international conference on Supercomputing
Automatic transformation of FORTRAN loops to reduce cache conflicts

ICS '91 Proceedings of the 5th international conference on Supercomputing
Experiences with data dependence abstractions

ICS '91 Proceedings of the 5th international conference on Supercomputing
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
An architecture for software-controlled data prefetching

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Tiling multidimensional iteration spaces for nonshared memory machines

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Optimizing for parallelism and data locality

ICS '92 Proceedings of the 6th international conference on Supercomputing
Software support for speculative loads

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Access normalization: loop restructuring for NUMA compilers

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
An efficient architecture for loop based data preloading

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Communication optimization and code generation for distributed memory machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Access normalization: loop restructuring for NUMA computers

ACM Transactions on Computer Systems (TOCS)
Managing pages in shared virtual memory systems: getting the compiler into the game

ICS '93 Proceedings of the 7th international conference on Supercomputing
Speculative prefetching

ICS '93 Proceedings of the 7th international conference on Supercomputing
Efficient simulation of caches under optimal replacement with applications to miss characterization

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Compiling for shared-memory and message-passing computers

ACM Letters on Programming Languages and Systems (LOPLAS)
Exploiting the parallelism available in loops

Computer
Counting solutions to Presburger formulas: how and why

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Data and program restructuring of irregular applications for cache-coherent multiprocessor

ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Unifying data and control transformations for distributed shared-memory machines

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler cache optimizations for banded matrix problems

ICS '95 Proceedings of the 9th international conference on Supercomputing
A limit study of local memory requirements using value reuse profiles

Proceedings of the 28th annual international symposium on Microarchitecture
SPAID: software prefetching in pointer- and call-intensive environments

Proceedings of the 28th annual international symposium on Microarchitecture
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
The influence of caches on the performance of heaps

Journal of Experimental Algorithmics (JEA)
Thread scheduling for cache locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
A quantitative analysis of loop nest locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The intrinsic bandwidth requirements of ordinary programs

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Combining loop transformations considering caches and scheduling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Fusion of Loops for Parallelism and Locality

IEEE Transactions on Parallel and Distributed Systems
Compiler techniques for data partitioning of sequentially iterated parallel loops

ICS '90 Proceedings of the 4th international conference on Supercomputing
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
Non-singular data transformations: definition, validity and applications

ICS '97 Proceedings of the 11th international conference on Supercomputing
Determining the idle time of a tiling

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A unified compiler algorithm for optimizing locality, parallelism and communication in out-of-core computations

Proceedings of the fifth workshop on I/O in parallel and distributed systems
Tuning compiler optimizations for simultaneous multithreading

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory data organization for improved cache performance in embedded processor applications

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Data transformations for eliminating conflict misses

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A hyperplane based approach for optimizing spatial locality in loop nests

ICS '98 Proceedings of the 12th international conference on Supercomputing
Eliminating conflict misses for high performance architectures

ICS '98 Proceedings of the 12th international conference on Supercomputing
An Efficient Solution to the Cache Thrashing Problem Caused by True Data Sharing

IEEE Transactions on Computers
Using generational garbage collection to implement cache-conscious data placement

Proceedings of the 1st international symposium on Memory management
Improving locality using loop and data transformations in an integrated framework

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Improving Cache Locality by a Combination of Loop and Data Transformations

IEEE Transactions on Computers - Special issue on cache memory and related problems
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts

IEEE Transactions on Parallel and Distributed Systems
Cache-conscious structure layout

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
A locality sensitive multi-module cache with explicit management

ICS '99 Proceedings of the 13th international conference on Supercomputing
Improving memory hierarchy performance for irregular applications

ICS '99 Proceedings of the 13th international conference on Supercomputing
An integer linear programming approach for optimizing cache locality

ICS '99 Proceedings of the 13th international conference on Supercomputing
Performance prediction of loop constructs on multiprocessor hierarchical-memory systems

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Cache miss equations: a compiler framework for analyzing and tuning memory behavior

ACM Transactions on Programming Languages and Systems (TOPLAS)
Nonsingular Data Transformations: Definition, Validity, and Applications

International Journal of Parallel Programming
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks

ACM Transactions on Computer Systems (TOCS)
Locality optimizations for multi-level caches

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Tuning Compiler Optimizations for Simultaneous Multithreading

International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
Transforming loops to recursion for multi-level memory hierarchies

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
A preprocessing step for global loop transformations for data transfer optimization

CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
Improving Memory Traffic by Assembly-Level Exploitation of Reuses for Vector Registers

The Journal of Supercomputing
Tiling optimizations for 3D scientific computations

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Improving fine-grained irregular shared-memory benchmarks by data reordering

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Source-to-Source Instrumentation for the Optimization of an Automatic Reading System

The Journal of Supercomputing
Exploiting non-uniform reuse for cache optimization

Proceedings of the 2001 ACM symposium on Applied computing
Compiler-based I/O prefetching for out-of-core applications

ACM Transactions on Computer Systems (TOCS)
Data locality enhancement by memory reduction

ICS '01 Proceedings of the 15th international conference on Supercomputing
Reducing memory requirements of nested loops for embedded systems

Proceedings of the 38th annual Design Automation Conference
Dynamic management of scratch-pad memory space

Proceedings of the 38th annual Design Automation Conference
Optimal tiling for minimizing communication in distributed shared-memory multiprocessors

Compiler optimizations for scalable parallel systems
Morphable Cache Architectures: Potential Benefits

OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
An efficient profile-analysis framework for data-layout optimizations

POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Static and Dynamic Locality Optimizations Using Integer Linear Programming

IEEE Transactions on Parallel and Distributed Systems
Data Relation Vectors: A New Abstraction for Data Optimizations

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Compiler-directed cache polymorphism

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Exploiting shared scratch pad memory space in embedded multiprocessor systems

Proceedings of the 39th annual Design Automation Conference
An I/O-Conscious Tiling Strategy for Disk-Resident Data Sets

The Journal of Supercomputing
Search space definition and exploration for nonuniform data reuse opportunities in data-dominant applications

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Combining Loop Transformations Considering Caches and Scheduling

International Journal of Parallel Programming
Quantifying the Multi-Level Nature of Tiling Interactions

International Journal of Parallel Programming
Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings

International Journal of Parallel Programming
Achieving Scalable Locality with Time Skewing

International Journal of Parallel Programming
Effective Hardware-Based Data Prefetching for High-Performance Processors

IEEE Transactions on Computers
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Compile-Time Techniques for Data Distribution in Distributed Memory Machines

IEEE Transactions on Parallel and Distributed Systems
The Power Test for Data Dependence

IEEE Transactions on Parallel and Distributed Systems
Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

IEEE Transactions on Computers
An Efficient Technique for Corner-Turn in SAR Image Reconstruction by Improving Cache Access

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Tiling and Memory Reuse for Sequences of Nested Loops

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Improving Cache Effectiveness through Array Data Layout Manipulation in SAC

IFL '00 Selected Papers from the 12th International Workshop on Implementation of Functional Languages
Array Unification: A Locality Optimization Technique

CC '01 Proceedings of the 10th International Conference on Compiler Construction
Improving cache hit ratio by extended referencing cache lines

Journal of Computing Sciences in Colleges
Data cache locking for higher program predictability

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Estimating cache misses and locality using stack distances

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Memory Organization for Improved Data Cache Performance in Embedded Processors

ISSS '96 Proceedings of the 9th international symposium on System synthesis
Array Regrouping and Its Use in Compiling Data-Intensive Embedded Applications

IEEE Transactions on Computers
Transforming Complex Loop Nests for Locality

The Journal of Supercomputing
A fast and accurate framework to analyze and optimize cache memory behavior

ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving effective bandwidth through compiler enhancement of global cache reuse

Journal of Parallel and Distributed Computing
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior

IEEE Transactions on Computers
A data locality optimizing algorithm

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Optimizing the memory bandwidth with loop fusion

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Optimizing Array-Intensive Applications for On-Chip Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Automatic blocking of QR and LU factorizations for locality

MSP '04 Proceedings of the 2004 workshop on Memory system performance
Memory Performance Optimizations For Real-Time Software HDTV Decoding

Journal of VLSI Signal Processing Systems
Performance Enhancement on Microprocessors with Hierarchical Memory Systems for Solving Large Sparse Linear Systems

International Journal of High Performance Computing Applications
Sparse Tiling for Stationary Iterative Methods

International Journal of High Performance Computing Applications
Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion

International Journal of High Performance Computing Applications
An accurate cost model for guiding data locality transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Analyzing data reuse for cache reconfiguration

ACM Transactions on Embedded Computing Systems (TECS)
Reuse analysis of indirectly indexed arrays

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Improving power efficiency with compiler-assisted cache replacement

Journal of Embedded Computing - Cache exploitation in embedded systems
The rise and fall of High Performance Fortran: an historical object lesson

Proceedings of the third ACM SIGPLAN conference on History of programming languages
MPSoC memory optimization using program transformation

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Buffer and Register Allocation for Memory Space Optimization

Journal of VLSI Signal Processing Systems
Data cache locking for tight timing calculations

ACM Transactions on Embedded Computing Systems (TECS)
Communication between nested loop programs via circular buffers in an embedded multiprocessor system

SCOPES '08 Proceedings of the 11th international workshop on Software & compilers for embedded systems
Block size selection of parallel LU and QR on PVP-based and RISC-based supercomputers

CHINA HPC '07 Proceedings of the 2007 Asian technology information program's (ATIP's) 3rd workshop on High performance computing in China: solution approaches to impediments for high performance computing
Exploiting loop-dependent stream reuse for stream processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Reducing memory requirements of resource-constrained applications

ACM Transactions on Embedded Computing Systems (TECS)
Loop transformations for reducing data space requirements of resource-constrained applications

SAS'03 Proceedings of the 10th international conference on Static analysis
Improving data locality by chunking

CC'03 Proceedings of the 12th international conference on Compiler construction
Exploiting the reuse supplied by loop-dependent stream references for stream processors

ACM Transactions on Architecture and Code Optimization (TACO)
Compiler-guided leakage optimization for banked scratch-pad memories

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Low power engineering

Embedded Systems Design
Combining performance aspects of irregular gauss-seidel via sparse tiling

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
PPMC: a programmable pattern based memory controller

ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Minimizing accumulative memory load cost on multi-core DSPs with multi-level memory

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.02

Strategies for cache and local memory management by global program transformation

Quantified Score

Visualization

Abstract