An Efficient Solution to the Cache Thrashing Problem Caused by True Data Sharing

Authors:
Guohua Jin;Zhiyuan Li;Fujie Chen
Affiliations:
Rice Univ., Houston, TX;Purdue Univ., West Lafayette, IN;Changsha Institute of Technology, Hunan, China
Venue:
IEEE Transactions on Computers
Year:
1998

Citing 22
Cited 0

Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
On the problem of optimizing data transfers for complex memory systems

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Multilevel cache hierarchies: organizations, protocols, and performance

Journal of Parallel and Distributed Computing
Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors

IEEE Transactions on Computers
More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Time Optimal Linear Schedules for Algorithms with Uniform Dependencies

IEEE Transactions on Computers
Reducing false sharing on shared memory multiprocessors through compile time data transformations

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler techniques for data partitioning of sequentially iterated parallel loops

ICS '90 Proceedings of the 4th international conference on Supercomputing
Automatic loop interchange

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
Dependence graphs and compiler optimizations

POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Structure of Computers and Computations

Structure of Computers and Computations
An Iteration Partition Approach for Cache or Local Memory Thrashing on Parallel Processing

IEEE Transactions on Computers
False Sharing and Spatial Locality in Multiprocessor Caches

IEEE Transactions on Computers
An Empirical Study of Fortran Programs for Parallelizing Compilers

IEEE Transactions on Parallel and Distributed Systems
Compile-Time Partitioning of Iterative Parallel Loops to Reduce Cache Coherency Traffic

IEEE Transactions on Parallel and Distributed Systems
Partitioning and Labeling of Loops by Unimodular Transformations

IEEE Transactions on Parallel and Distributed Systems
Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Loop Restructuring Techniques For Thrashing Problem

PARLE '92 Proceedings of the 4th International PARLE Conference on Parallel Architectures and Languages Europe

Quantified Score

Hi-index	14.98

Visualization

Abstract

When parallel programs are executed on multiprocessors with private caches, a set of data may be repeatedly used and modified by different threads. Such data sharing can often result in cache thrashing, which degrades memory performance. This paper presents and evaluates a loop restructuring method to reduce or even eliminate cache thrashing caused by true data sharing in nested parallel loops. This method uses a compiler analysis which applies linear algebra and the theory of numbers to the subscript expressions of array references. Due to this method's simplicity, it can be efficiently implemented in any parallel compiler. Experimental results show quite significant performance improvements over existing static and dynamic scheduling methods.