Minerva: An Adaptive Subblock Coherence Protocol for Improved SMP Performance

Authors:
Jeffrey B. Rothman;Alan Jay Smith
Affiliations:
-;-
Venue:
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Year:
2002

Citing 34
Cited 0

Coherency for multiprocessor virtual address caches

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
A characterization of sharing in parallel programs and its application to coherency protocol evaluation

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A cache coherence approach for large multiprocessor systems

ICS '88 Proceedings of the 2nd international conference on Supercomputing
The effect of sharing on the cache and bus performance of parallel programs

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Evaluating the performance of four snooping cache coherency protocols

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
A Survey of Cache Coherence Schemes for Multiprocessors

Computer
Adjustable block size coherent caches

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons

ACM Computing Surveys (CSUR)
The detection and elimination of useless misses in multiprocessors

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Cache write policies and performance

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The accuracy of trace-driven simulations of multiprocessors

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Combined performance gains of simple cache protocol extensions

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Decoupled sectored caches: conciliating low tag implementation cost

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Reducing false sharing on shared memory multiprocessors through compile time data transformations

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Essential misses and data traffic in coherence protocols

Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
The pool of subsectors cache design

ICS '99 Proceedings of the 13th international conference on Supercomputing
The performance impact of block sizes and fetch strategies

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
False Sharing and Spatial Locality in Multiprocessor Caches

IEEE Transactions on Computers
A dynamic cache sub-block design to reduce false sharing

ICCD '95 Proceedings of the 1995 International Conference on Computer Design: VLSI in Computers and Processors
An Adaptive Update-Based Cache Coherence Protocol for Reduction of Miss Rate and Traffic

PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
Experimental evaluation of on-chip microprocessor cache memories

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
A low-overhead coherence solution for multiprocessors with private cache memories

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Two techniques for improving performance on bus-based multiprocessors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
The Impact of Instruction-Level Parallelism on Multiprocessor Performance and Simulation Methodology

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Evaluation of cache consistency algorithm performance

MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Multiprocessor Memory Reference Generation Using Cerberus

MASCOTS '99 Proceedings of the 7th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Sector Cache Design and Performance

MASCOTS '00 Proceedings of the 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Analysis of Shared Memory Misses and Reference Patterns

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
SPLASH: Stanford parallel applications for shared-memory*

SPLASH: Stanford parallel applications for shared-memory*
Minerva: An Adaptive Subblock Coherence Protocol for Improved SMP

Minerva: An Adaptive Subblock Coherence Protocol for Improved SMP
False sharing and its effect on shared memory performance

Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
Structural aspects of the system/360 model 85: II the cache

IBM Systems Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new cache protocol, Minerva, which allows the effective cache block size to very dynamically. Minerva works using sector caches (also known as block/subblock caches). Cache consistency attributes (from the MESI set of states) are associated with each 4-byte word in the cache. Each block can itself have one of the attributes invalid, exclusive or shared. Each block also has a current subblock size, of 2k words and a confidence value for hysteresis. The subblock size is reevaluated every time there is an external access (read or invalidate) to the block. When a fetch miss occurs within a block, a subblock equal to the current subblock size is fetched. Note that the fetch may involve a gather operation, with various words coming from different sources; some of the words may already be present.Depending on the assumed cache sizes, block sizes, but width, and bus timings, we find that Minerva reduces execution times by 19-40%, averaged over 12 test parallel programs. For a 64-bit wide bus, we find a consistent execution time reduction of around 30%. Our evaluation considers the utility of various other optimizations and considers the extra state bits required.