Interconnection networks for large-scale parallel processing: theory and case studies
Interconnection networks for large-scale parallel processing: theory and case studies
Static scheduling of synchronous data flow programs for digital signal processing
IEEE Transactions on Computers
Correct memory operation of cache-based multiprocessors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Hierarchical cache/bus architecture for shared memory multiprocessors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
High-performance computer architecture
High-performance computer architecture
Logic verification algorithms and their parallel implementation
DAC '87 Proceedings of the 24th ACM/IEEE Design Automation Conference
An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The Wisconsin multicube: a new large-scale cache-coherent multiprocessor
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
ACM Computing Surveys (CSUR)
ACM Transactions on Programming Languages and Systems (TOPLAS)
Proceedings of the Tutorial and Workshop on Category Theory and Computer Programming
Experimental evaluation of on-chip microprocessor cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
An economical solution to the cache coherence problem
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Combining hardware and software cache coherence strategies
ICS '91 Proceedings of the 5th international conference on Supercomputing
A software coherence scheme with the assistance of directories
ICS '91 Proceedings of the 5th international conference on Supercomputing
Parallel program behavioral study on a shared-memory multiprocessor
ICS '91 Proceedings of the 5th international conference on Supercomputing
Modeling the performance of limited pointers directories for cache coherence
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The Stanford Dash Multiprocessor
Computer
The DASH prototype: implementation and performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Adjustable block size coherent caches
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Improved multithreading techniques for hiding communication latency in multiprocessors
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons
ACM Computing Surveys (CSUR)
A distributed shared memory multiprocessor ASURA: memory and cache architecture
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
An evaluation of directory protocols for medium-scale shared-memory multiprocessors
ICS '94 Proceedings of the 8th international conference on Supercomputing
Software-extended coherent shared memory: performance and cost
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
CRL: high-performance all-software distributed shared memory
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
In-memory directories: eliminating the cost of directories in CC-NUMAs
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The DASH prototype: implementation and performance
25 years of the international symposia on Computer architecture (selected papers)
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols
IEEE Transactions on Computers
Hierarchical Scalable Photonic Architectures for High-Performance Processor Interconnection
IEEE Transactions on Computers
The DASH Prototype: Logic Overhead and Performance
IEEE Transactions on Parallel and Distributed Systems
Improving Memory Utilization in Cache Coherence Directories
IEEE Transactions on Parallel and Distributed Systems
A Novel Approach to Reduce L2 Miss Latency in Shared-Memory Multiprocessors
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A Cache Coherency Protocol for Optically Connected Parallel Computer Systems
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
IEEE Transactions on Parallel and Distributed Systems
A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Proximity-aware directory-based coherence for multi-core processor architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
To Snoop or Not to Snoop: Evaluation of Fine-Grain and Coarse-Grain Snoop Filtering Techniques
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Journal of Parallel and Distributed Computing
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A scalable organization for distributed directories
Journal of Systems Architecture: the EUROMICRO Journal
Cohesion: a hybrid memory model for accelerators
Proceedings of the 37th annual international symposium on Computer architecture
WAYPOINT: scaling coherence to thousand-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
SPACE: sharing pattern-based directory coherence for multicore scalability
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks
Proceedings of the 38th annual international symposium on Computer architecture
Complexity-effective multicore coherence
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Building expressive, area-efficient coherence directories
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.01 |
This paper presents an empirical evaluation of two memory-efficient directory methods for maintaining coherent caches in large shared memory multiprocessors. Both directory methods are modifications of a scheme proposed by Censier and Feautrier [5] that does not rely on a specific interconnection network and can be readily distributed across interleaved main memory. The schemes considered here overcome the large amount of memory required for tags in the original scheme in two different ways. In the first scheme each main memory block is sectored into sub-blocks for which the large tag overhead is shared. In the second scheme a limited number of large tags are stored in an associative cache and shared among a much larger number of main memory blocks. Simulations show that in terms of access time and network traffic both directory methods provide significant performance improvements over a memory system in which shared-writeable data is not cached. The large block sizes required for the sectored scheme, however, promotes sufficient false sharing that its performance is markedly worse than using a tag cache.