Improving Memory Utilization in Cache Coherence Directories

Authors:
D. J. Lilja;P. C. Yew
Affiliations:
-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1993

Citing 23
Cited 3

Multiprocessor cache design considerations

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
An evaluation of directory schemes for cache coherence

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A cache coherence scheme with fast selective invalidation

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A characterization of sharing in parallel programs and its application to coherency protocol evaluation

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Memory-reference characteristics of multiprocessor applications under MACH

SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The design of a lockup-free cache for high-performance multiprocessors

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Analysis of cache invalidation patterns in multiprocessors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Directory-Based Cache Coherence in Large-Scale Multiprocessors

Computer
Cedar Fortran and other vector and parallel Fortran dialects

The Journal of Supercomputing
Scalable coherent interface

Computer
LimitLESS directories: A scalable cache coherence scheme

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Combining hardware and software cache coherence strategies

ICS '91 Proceedings of the 5th international conference on Supercomputing
Comparison of hardware and software cache coherence schemes

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Processor parallelism considerations and memory latency reduction in shared memory multiprocessors

Processor parallelism considerations and memory latency reduction in shared memory multiprocessors
A version control approach to Cache coherence

ICS '89 Proceedings of the 3rd international conference on Supercomputing
An empirical evaluation of two memory-efficient directory methods

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Implementing a cache consistency protocol

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Reliable Multicast Delivery Based on Local Retransmission

ICOIN '02 Revised Papers from the International Conference on Information Networking, Wireless Communications Technologies and Network Applications-Part I
Using cache memory to reduce processor-memory traffic

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
The NYU Ultracomputer—designing a MIMD, shared-memory parallel machine (Extended Abstract)

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
A low-overhead coherence solution for multiprocessors with private cache memories

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
An economical solution to the cache coherence problem

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture

Software-extended coherent shared memory: performance and cost

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A comprehensive bibliography of distributed shared memory

ACM SIGOPS Operating Systems Review
The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficiently maintaining cache coherence is a major problem in large-scale shared memorymultiprocessors. Hardware directory coherence schemes have very high memoryrequirements, while software-directed schemes must rely on imprecise compile-timememory disambiguation. Recently proposed dynamically tagged directory schemes allocate pointers to blocks only as they are referenced, which significantly reduces their memory requirements, but they still allocate pointers to blocks that do not need them. Theauthors present two compiler optimizations that exploit the high-level sharing information available to the compiler to further reduce the size of a tagged directory by allocating pointers only when necessary. Trace-driven simulations are used to show that the performance of this combined hardware-software approach is comparable to othercoherence schemes, but with significantly lower memory requirements. In addition, thesesimulations suggest that this approach is less sensitive to the quality of the memorydisambiguation and interprocedural analysis performed by the compiler than software-only coherence schemes.