Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse

Authors:
Chen Ding;Ken Kennedy
Affiliations:
-;-
Venue:
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Year:
2001

Citing 21
Cited 23

A global approach to detection of parallelism

A global approach to detection of parallelism
Automatic decomposition of scientific programs for parallel execution

POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
A practical algorithm for exact array dependence analysis

Communications of the ACM
Vector Register Allocation

IEEE Transactions on Computers
Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Fusion of Loops for Parallelism and Locality

IEEE Transactions on Parallel and Distributed Systems
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Using integer sets for data-parallel program analysis and optimization

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks

ACM Transactions on Computer Systems (TOCS)
Fast greedy weighted fusion

Proceedings of the 14th international conference on Supercomputing
Transforming loops to recursion for multi-level memory hierarchies

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Iteration Space Slicing for Locality

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Inter-array Data Regrouping

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Collective Loop Fusion for Array Contraction

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
The Memory Bandwidth Bottleneck and its Amelioration by a Compiler

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Reuse Distance Analysis

Reuse Distance Analysis
Optimizing supercompilers for supercomputers

Optimizing supercompilers for supercomputers
Software methods for improvement of cache performance on supercomputer applications

Software methods for improvement of cache performance on supercomputer applications
Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse

Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse
A study of replacement algorithms for a virtual-storage computer

IBM Systems Journal

Computation regrouping: restructuring programs for temporal data cache locality

ICS '02 Proceedings of the 16th international conference on Supercomputing
Two techniques for reconciling algorithm parallelism with memory constraints

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Data Sequence Locality: A Generalization of Temporal Locality

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Better tiling and array contraction for compiling scientific programs

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
References

Sourcebook of parallel computing
Array Regrouping and Its Use in Compiling Data-Intensive Embedded Applications

IEEE Transactions on Computers
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior

IEEE Transactions on Computers
Restructuring computations for temporal data cache locality

International Journal of Parallel Programming
Improving Data Locality by Array Contraction

IEEE Transactions on Computers
Automatic tiling of iterative stencil loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Using Scratchpad to Exploit Object Locality in Java

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Profitable loop fusion and tiling using model-driven empirical search

Proceedings of the 20th annual international conference on Supercomputing
Miss Rate Prediction Across Program Inputs and Cache Configurations

IEEE Transactions on Computers
A tuning framework for software-managed memory hierarchies

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Instruction balance and its relation to program energy consumption

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Dynamic voltage and frequency scaling for scientific applications

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Model-guided empirical tuning of loop fusion

International Journal of High Performance Systems Architecture
Exposing tunable parameters in multi-threaded numerical code

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Memory Latency Reduction via Thread Throttling

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A cache-conscious profitability model for empirical tuning of loop fusion

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Generalized index-set splitting

CC'05 Proceedings of the 14th international conference on Compiler Construction
Efficient execution of time-step computations with pipelined parallelism and inter-thread data locality optimizaitions

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Removing impediments to loop fusion through code transformations

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing

Quantified Score

Hi-index	0.01

Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse

Quantified Score

Visualization

Abstract