Software methods for improvement of cache performance on supercomputer applications

Authors:
Allan Kennedy Porterfield;K. W. Kennedy
Affiliations:
-;-
Venue:
Software methods for improvement of cache performance on supercomputer applications
Year:
1989

Citing 0
Cited 75

Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Loop distribution with arbitrary control flow

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Data cache performance of supercomputer applications

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Experience with interprocedural analysis of array side effects

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Practical dependence testing

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
An architecture for software-controlled data prefetching

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparative evaluation of latency reducing and tolerating techniques

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Hiding memory latency using dynamic scheduling in shared-memory multiprocessors

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Automatic partitioning of a program dependence graph into parallel tasks

IBM Journal of Research and Development
Delinearization: an efficient way to break multiloop dependence equations

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Optimizing for parallelism and data locality

ICS '92 Proceedings of the 6th international conference on Supercomputing
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Access normalization: loop restructuring for NUMA compilers

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Compiler blockability of numerical algorithms

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Loop distribution with multiple exits

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Access normalization: loop restructuring for NUMA computers

ACM Transactions on Computer Systems (TOCS)
A static parameter based performance prediction tool for parallel programs

ICS '93 Proceedings of the 7th international conference on Supercomputing
Effects of memory latencies on non-blocking processor/cache architectures

ICS '93 Proceedings of the 7th international conference on Supercomputing
Unifying data and control transformations for distributed shared-memory machines

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Influence of cross-interferences on blocked loops: a case study with matrix-vector multiply

ACM Transactions on Programming Languages and Systems (TOPLAS)
Active memory: a new abstraction for memory-system simulation

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Design of cache memories for multi-threaded dataflow architecture

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Compiler cache optimizations for banded matrix problems

ICS '95 Proceedings of the 9th international conference on Supercomputing
SPAID: software prefetching in pointer- and call-intensive environments

Proceedings of the 28th annual international symposium on Microarchitecture
Informing memory operations: providing memory performance feedback in modern processors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Active memory: a new abstraction for memory system simulation

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Fusion of Loops for Parallelism and Locality

IEEE Transactions on Parallel and Distributed Systems
Compiler-directed data prefetching in multiprocessors with memory hierarchies

ICS '90 Proceedings of the 4th international conference on Supercomputing
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
Run-time adaptive cache hierarchy management via reference analysis

Proceedings of the 24th annual international symposium on Computer architecture
Run-time spatial locality detection and optimization

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Profetching and memory system behavior of the SPEC95 benchmark suite

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Tolerating latency in multiprocessors through compiler-inserted prefetching

ACM Transactions on Computer Systems (TOCS)
Informing memory operations: memory performance feedback mechanisms and their applications

ACM Transactions on Computer Systems (TOCS)
Improving the memory-system performance of sparse-matrix vector multiplication

IBM Journal of Research and Development
Cache conscious programming in undergraduate computer science

SIGCSE '99 The proceedings of the thirtieth SIGCSE technical symposium on Computer science education
Improving memory hierarchy performance for irregular applications

ICS '99 Proceedings of the 13th international conference on Supercomputing
Nonlinear array layouts for hierarchical memory systems

ICS '99 Proceedings of the 13th international conference on Supercomputing
Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques

IEEE Transactions on Computers
Cache miss equations: a compiler framework for analyzing and tuning memory behavior

ACM Transactions on Programming Languages and Systems (TOPLAS)
Run-Time Cache Bypassing

IEEE Transactions on Computers
Optimized unrolling of nested loops

Proceedings of the 14th international conference on Supercomputing
Push vs. pull: data movement for linked data structures

Proceedings of the 14th international conference on Supercomputing
Data prefetch mechanisms

ACM Computing Surveys (CSUR)
Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Exact analysis of the cache behavior of nested loops

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Static and Dynamic Locality Optimizations Using Integer Linear Programming

IEEE Transactions on Parallel and Distributed Systems
Optimized Unrolling of Nested Loops

International Journal of Parallel Programming
Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings

International Journal of Parallel Programming
Cache Profiling and the SPEC Benchmarks: A Case Study

Computer
An Implementation of Interprocedural Bounded Regular Section Analysis

IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

IEEE Transactions on Computers
Combining Loop Fusion with Prefetching on Shared-memory Multiprocessors

ICPP '97 Proceedings of the international Conference on Parallel Processing
A Hardware Scheme for Data Prefetching

HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
The Combined Effectiveness of Unimodular Transformations, Tiling, and Software Prefetching

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A Compiler-Directed Cache Coherence Scheme Using Data Prefetching

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Estimating cache misses and locality using stack distances

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Access ordering and memory-conscious cache utilization

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Program balance and its impact on high performance RISC architectures

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Improving effective bandwidth through compiler enhancement of global cache reuse

Journal of Parallel and Distributed Computing
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior

IEEE Transactions on Computers
Improving register allocation for subscripted variables

ACM SIGPLAN Notices - Best of PLDI 1979-1999
A data locality optimizing algorithm

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Tolerating memory latency through push prefetching for pointer-intensive applications

ACM Transactions on Architecture and Code Optimization (TACO)
A combined DMA and application-specific prefetching approach for tackling the memory latency bottleneck

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Reducing Cache Pollution via Dynamic Data Prefetch Filtering

IEEE Transactions on Computers
Miss Rate Prediction Across Program Inputs and Cache Configurations

IEEE Transactions on Computers
Paper: The SPEC benchmarks

Parallel Computing
New memory organizations for 3d DRAM and PCMs

ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
A multi-core memory organization for 3-d DRAM as main memory

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems

Quantified Score

Hi-index	0.02

Software methods for improvement of cache performance on supercomputer applications

Quantified Score

Visualization

Abstract