An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Cache Invalidation Patterns in Shared-Memory Multiprocessors
IEEE Transactions on Computers
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Cache coherence directories for scalable multiprocessors
Cache coherence directories for scalable multiprocessors
An evaluation of directory protocols for medium-scale shared-memory multiprocessors
ICS '94 Proceedings of the 8th international conference on Supercomputing
Efficient support for irregular applications on distributed-memory machines
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
An Efficient Tree Cache Coherence Protocol for Distributed Shared Memory Multiprocessors
IEEE Transactions on Computers
An empirical evaluation of two memory-efficient directory methods
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Architecture and design of AlphaServer GS320
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Interconnection Networks: An Engineering Approach
Interconnection Networks: An Engineering Approach
Segment Directory Enhancing the Limited Directory Cache Coherence Schemes
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Using cache memory to reduce processor-memory traffic
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Design and Performance of Directory Caches for Scalable Shared Memory Multiprocessors
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
A New Scalable Directory Architecture for Large-Scale Multiprocessors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
The performance and scalability of distributed shared-memory cache coherence protocols
The performance and scalability of distributed shared-memory cache coherence protocols
High-throughput coherence control and hardware messaging in everest
IBM Journal of Research and Development
An efficient cache design for scalable glueless shared-memory multiprocessors
Proceedings of the 3rd conference on Computing frontiers
A consistency architecture for hierarchical shared caches
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Journal of Parallel and Distributed Computing
Scalable directory architecture for distributed shared memory chip multiprocessors
ACM SIGARCH Computer Architecture News
A Novel Cache Organization for Tiled Chip Multiprocessor
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
A scalable organization for distributed directories
Journal of Systems Architecture: the EUROMICRO Journal
A two-level directory organization solution for CC-NUMA systems
ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
SPACE: sharing pattern-based directory coherence for multicore scalability
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A novel lightweight directory architecture for scalable shared-memory multiprocessors
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Manager-client pairing: a framework for implementing coherence hierarchies
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
The Journal of Supercomputing
Hi-index | 0.00 |
One important issue the designer of a scalable shared-memory multiprocessor must deal with is the amount of extra memory required to store the directory information. It is desirable that the directory memory overhead be kept as low as possible, and that it scales very slowly with the size of the machine. Unfortunately, current directory architectures provide scalability at the expense of performance. This work presents a scalable directory architecture that significantly reduces the size of the directory for large-scale configurations of a multiprocessor without degrading performance. First, we propose multilayer clustering as an effective approach to reduce the width of directory entries. Based on this concept, we derive three new compressed sharing codes, some of them with a space complexity of {\rm{O}}\left(\log_2\left(\log_{2}\left(N\right)\right)\right) for an N-node system. Then, we present a novel two-level directory architecture to eliminate the penalty caused by compressed directories in general. The proposed organization consists of a small full-map first-level directory (which provides precise information for the most recently referenced lines) and a compressed second-level directory (which provides in-excess information for all the lines). The proposals are evaluated based on extensive execution-driven simulations (using RSIM) of a 64-node cc-NUMA multiprocessor. Results demonstrate that a system with a two-level directory architecture achieves the same performance as a multiprocessor with a big and nonscalable full-map directory, with a very significant reduction of the memory overhead.