Analysis of interprocedural side effects in a parallel programming environment
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
Comparison of hardware and software cache coherence schemes
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Detecting redundant accesses to array data
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Life span strategy—a compiler-based approach to cache coherence
ICS '92 Proceedings of the 6th international conference on Supercomputing
Cache coherence using local knowledge
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Execution-driven tools for parallel simulation of parallel architectures and applications
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Improving locality and parallelism in nested loops
Improving locality and parallelism in nested loops
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
IEEE Transactions on Parallel and Distributed Systems
Automatic array privatization and demand-driven symbolic analysis
Automatic array privatization and demand-driven symbolic analysis
Array SSA form and its use in parallelization
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A compiler-directed cache coherence scheme with improved intertask locality
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps
IEEE Transactions on Parallel and Distributed Systems
Eliminating Stale Data References through Array Data-Flow Analysis
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
An Exact Method for Analysis of Value-based Array Data Dependences
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Interprocedural Array Data Flow Analysis for Cache Coherence
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
(R) Program Analysis for Cache Coherence: Beyond Procedural Boundaries
ICPP '96 Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3
Hardware and compiler support for cache coherence in large-scale shared-memory multiprocessors
Hardware and compiler support for cache coherence in large-scale shared-memory multiprocessors
Tuning data replication for improving behavior of MPSoC applications
Proceedings of the 14th ACM Great Lakes symposium on VLSI
Affinity-Driven System Design Exploration for Heterogeneous Multiprocessor SoC
IEEE Transactions on Computers
Hi-index | 0.00 |
In this paper, we present compiler algorithms for detecting references to stale data in shared-memory multiprocessors. The algorithm consists of two key analysis techniques, stale reference detection and locality preserving analysis. While the stale reference detection finds the memory reference patterns that may violate cache coherence, the locality preserving analysis minimizes the number of such stale references by analyzing both temporal and spatial reuses. By computing the regions referenced by arrays inside loops, we extend the previous scalar algorithms [9] for more precise analysis. We develop a full interprocedural array data-flow algorithm, which performs both bottom-up side-effect analysis and top-down context analysis on the procedure call graph to further exploit locality across procedure boundaries. The interprocedural algorithm eliminates cache invalidations at procedure boundaries, which were assumed in the previous compiler algorithms [9]. We have fully implemented the algorithm in the Polaris parallelizing compiler [28]. Using execution-driven simulations on Perfect Club benchmarks, we demonstrate how unnecessary cache misses can be eliminated by the automatic stale reference detection. The algorithm can be used to implement cache coherence in the shared-memory multiprocessors that do not have hardware directories, such as Cray T3D [21].