A Novel Approach to Reduce L2 Miss Latency in Shared-Memory Multiprocessors
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
A consistency architecture for hierarchical shared caches
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Distributed cooperative caching
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A scalable organization for distributed directories
Journal of Systems Architecture: the EUROMICRO Journal
A two-level directory organization solution for CC-NUMA systems
ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Evaluation of low-overhead organizations for the directory in future many-core CMPs
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
A new hybrid directory scheme for shared memory multi-processors
CSR'06 Proceedings of the First international computer science conference on Theory and Applications
Complexity-effective multicore coherence
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Building expressive, area-efficient coherence directories
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
Abstract: The memory overhead introduced by directories constitutes a major hurdle in the scalability of cc-NUMA archiectures, which makes the shared-memory paradigm unfeasible for very large-scale systems. This work is focused on improving the scalability of shared-memory multiprocessors by significantly reducing the size of the directory. We propose multilayer clustering as an effective approach to reduce the directory-entry width. Detailed evaluation for 64 processors shows that using this approach we can drastically reduce the memory overhead, while suffering a performance degradation very similar to previous compressed schemes (such as Coarse Vector). In addition, a novel two-level directory architecture is proposed in order to eliminate the penalty caused by these compressed directories. This organization consists of a small Full-Map first- level directory (which provides precise information for the most recently referenced lines)and a compressed second- level directory (which provides in-excess information). Results show that a system with this directory architecture can achieve the same performance as a multiprocessor with a big and non-scalable Full-Map directory, with a very significant reduction of the memory overhead.