Compiler Analysis for Cache Coherence: Interprocedural Array Data-Flow Analysis and Its Impact on Cache Performance

Authors:
Lynn Choi;Pen-Chung Yew
Affiliations:
Korea Univ., Seoul, Korea;Univ. of Minnesota, Minneapolis
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2000

Citing 21
Cited 2

Analysis of interprocedural side effects in a parallel programming environment

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Compiler-Directed Cache Management in Multiprocessors

Computer
The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Efficiently computing static single assignment form and the control dependence graph

ACM Transactions on Programming Languages and Systems (TOPLAS)
Comparison of hardware and software cache coherence schemes

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Detecting redundant accesses to array data

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Life span strategy—a compiler-based approach to cache coherence

ICS '92 Proceedings of the 6th international conference on Supercomputing
Cache coherence using local knowledge

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Execution-driven tools for parallel simulation of parallel architectures and applications

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Improving locality and parallelism in nested loops

Improving locality and parallelism in nested loops
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Potential of Compile-Time Analysis to Adapt the Cache Coherence Enforcement Strategy to the Data Sharing Characteristics

IEEE Transactions on Parallel and Distributed Systems
Automatic array privatization and demand-driven symbolic analysis

Automatic array privatization and demand-driven symbolic analysis
Array SSA form and its use in parallelization

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A compiler-directed cache coherence scheme with improved intertask locality

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps

IEEE Transactions on Parallel and Distributed Systems
Eliminating Stale Data References through Array Data-Flow Analysis

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
An Exact Method for Analysis of Value-based Array Data Dependences

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Interprocedural Array Data Flow Analysis for Cache Coherence

LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
(R) Program Analysis for Cache Coherence: Beyond Procedural Boundaries

ICPP '96 Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3
Hardware and compiler support for cache coherence in large-scale shared-memory multiprocessors

Hardware and compiler support for cache coherence in large-scale shared-memory multiprocessors

Tuning data replication for improving behavior of MPSoC applications

Proceedings of the 14th ACM Great Lakes symposium on VLSI
Affinity-Driven System Design Exploration for Heterogeneous Multiprocessor SoC

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present compiler algorithms for detecting references to stale data in shared-memory multiprocessors. The algorithm consists of two key analysis techniques, stale reference detection and locality preserving analysis. While the stale reference detection finds the memory reference patterns that may violate cache coherence, the locality preserving analysis minimizes the number of such stale references by analyzing both temporal and spatial reuses. By computing the regions referenced by arrays inside loops, we extend the previous scalar algorithms [9] for more precise analysis. We develop a full interprocedural array data-flow algorithm, which performs both bottom-up side-effect analysis and top-down context analysis on the procedure call graph to further exploit locality across procedure boundaries. The interprocedural algorithm eliminates cache invalidations at procedure boundaries, which were assumed in the previous compiler algorithms [9]. We have fully implemented the algorithm in the Polaris parallelizing compiler [28]. Using execution-driven simulations on Perfect Club benchmarks, we demonstrate how unnecessary cache misses can be eliminated by the automatic stale reference detection. The algorithm can be used to implement cache coherence in the shared-memory multiprocessors that do not have hardware directories, such as Cray T3D [21].