Coherency for multiprocessor virtual address caches
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A cache coherence approach for large multiprocessor systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
The effect of sharing on the cache and bus performance of parallel programs
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Evaluating the performance of four snooping cache coherency protocols
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Adjustable block size coherent caches
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons
ACM Computing Surveys (CSUR)
The detection and elimination of useless misses in multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Cache write policies and performance
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The accuracy of trace-driven simulations of multiprocessors
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Combined performance gains of simple cache protocol extensions
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Decoupled sectored caches: conciliating low tag implementation cost
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Reducing false sharing on shared memory multiprocessors through compile time data transformations
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Essential misses and data traffic in coherence protocols
Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
The pool of subsectors cache design
ICS '99 Proceedings of the 13th international conference on Supercomputing
The performance impact of block sizes and fetch strategies
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
False Sharing and Spatial Locality in Multiprocessor Caches
IEEE Transactions on Computers
A dynamic cache sub-block design to reduce false sharing
ICCD '95 Proceedings of the 1995 International Conference on Computer Design: VLSI in Computers and Processors
An Adaptive Update-Based Cache Coherence Protocol for Reduction of Miss Rate and Traffic
PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
Experimental evaluation of on-chip microprocessor cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
A low-overhead coherence solution for multiprocessors with private cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Two techniques for improving performance on bus-based multiprocessors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
The Impact of Instruction-Level Parallelism on Multiprocessor Performance and Simulation Methodology
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Evaluation of cache consistency algorithm performance
MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Multiprocessor Memory Reference Generation Using Cerberus
MASCOTS '99 Proceedings of the 7th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Sector Cache Design and Performance
MASCOTS '00 Proceedings of the 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Analysis of Shared Memory Misses and Reference Patterns
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
SPLASH: Stanford parallel applications for shared-memory*
SPLASH: Stanford parallel applications for shared-memory*
Minerva: An Adaptive Subblock Coherence Protocol for Improved SMP
Minerva: An Adaptive Subblock Coherence Protocol for Improved SMP
False sharing and its effect on shared memory performance
Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
Structural aspects of the system/360 model 85: II the cache
IBM Systems Journal
Hi-index | 0.00 |
We present a new cache protocol, Minerva, which allows the effective cache block size to very dynamically. Minerva works using sector caches (also known as block/subblock caches). Cache consistency attributes (from the MESI set of states) are associated with each 4-byte word in the cache. Each block can itself have one of the attributes invalid, exclusive or shared. Each block also has a current subblock size, of 2k words and a confidence value for hysteresis. The subblock size is reevaluated every time there is an external access (read or invalidate) to the block. When a fetch miss occurs within a block, a subblock equal to the current subblock size is fetched. Note that the fetch may involve a gather operation, with various words coming from different sources; some of the words may already be present.Depending on the assumed cache sizes, block sizes, but width, and bus timings, we find that Minerva reduces execution times by 19-40%, averaged over 12 test parallel programs. For a 64-bit wide bus, we find a consistent execution time reduction of around 30%. Our evaluation considers the utility of various other optimizations and considers the extra state bits required.