An Iteration Partition Approach for Cache or Local Memory Thrashing on Parallel Processing

Authors:
J. Z. Fang;M. Lu
Affiliations:
-;-
Venue:
IEEE Transactions on Computers
Year:
1993

Citing 7
Cited 6

Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
Interprocedural dependence analysis and parallelization

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
A global approach to detection of parallelism

A global approach to detection of parallelism
Multilevel cache hierarchies: organizations, protocols, and performance

Journal of Parallel and Distributed Computing
Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Dynamic Processor Self-Scheduling for General Parallel Nested Loops

IEEE Transactions on Computers
Iteration Space Tiling for Memory Hierarchies

Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing

Formalized methodology for data reuse exploration in hierarchical memory mappings

ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
An Efficient Solution to the Cache Thrashing Problem Caused by True Data Sharing

IEEE Transactions on Computers
A Software Approach to Avoiding Spatial Cache Collisions in Parallel Processor Systems

IEEE Transactions on Parallel and Distributed Systems
Power and Speed-Efficient Code Transformation of Video Compression Algorithms for RISC Processors

Journal of VLSI Signal Processing Systems - Special issue on multimedia signal processing
Formal model of data reuse analysis for hierarchical memory organizations

Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Low power engineering

Embedded Systems Design

Quantified Score

Hi-index	14.98

Visualization

Abstract

Parallel processing systems with cache or local memory in the memory hierarchies are considered. These systems have a local cache memory in each processor and usually employ a write-invalidate protocol for the cache coherence. In such systems, a problem called 'cache or local memory thrashing' can arise in executions of parallel programs, when the data unnecessarily moves back and forth between the caches or local memories in different processors. An approach to eliminate, or at least to reduce, such movement for nested parallel loops is presented. It is based on relations between array element accesses and enclosed loop indexes in the loops. The relations can be used to assign processors to execute the appropriate iterations for parallel loops in the loop nests with respect to the data in their caches or local memories. An algorithm for calculating the correct iteration of the parallel loop in terms of loop indexes of the previous iterations executed in the processor is presented. This method benefits parallel code with nested loop structures in a wide range of applications. The experimental results show that the technique can achieve speedups up to 2.