An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The Stanford Dash Multiprocessor
Computer
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Cache coherence directories for scalable multiprocessors
Cache coherence directories for scalable multiprocessors
An evaluation of directory protocols for medium-scale shared-memory multiprocessors
ICS '94 Proceedings of the 8th international conference on Supercomputing
Efficient support for irregular applications on distributed-memory machines
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
An empirical evaluation of two memory-efficient directory methods
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Timestamp snooping: an approach for extending SMPs
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Introduction to the Special Section on High Performance Memory Systems
IEEE Transactions on Computers
Segment Directory Enhancing the Limited Directory Cache Coherence Schemes
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Switch Cache: A Framework for Improving the Remote Memory Access Latency of CC-NUMA Multiprocessors
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Using Switch Directories to Speed Up Cache-to-Cache Transfers in CC-NUMA Multiprocessors
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Token coherence: decoupling performance and correctness
Proceedings of the 30th annual international symposium on Computer architecture
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
IEEE Transactions on Parallel and Distributed Systems
A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
An efficient cache design for scalable glueless shared-memory multiprocessors
Proceedings of the 3rd conference on Computing frontiers
High-throughput coherence control and hardware messaging in everest
IBM Journal of Research and Development
A novel lightweight directory architecture for scalable shared-memory multiprocessors
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
In glueless shared-memory multiprocessors where cache coherence is usually maintained using a directory-based protocol, the fast access to the on-chip components (caches and network router, among others) contrasts with the much slower main memory. Unfortunately, directory-based protocols need to obtain the sharing status of every memory block before coherence actions can be performed. This information has traditionally been stored in main memory, and therefore these cache coherence protocols are far from being optimal. In this work, we propose two alternative designs for the last-level private cache of glueless shared-memory multiprocessors: the lightweight directory and the SGluM cache. Our proposals completely remove directory information from main memory and store it in the home node's L2 cache, thus reducing both the number of accesses to main memory and the directory memory overhead. The main characteristics of the lightweight directory are its simplicity and the significant improvement in the execution time for most applications. Its drawback, however, is that the performance of some particular applications could be degraded. On the other hand, the SGluM cache offers more modest improvements in execution time for all the applications by adding some extra structures that cope with the cases in which the lightweight directory fails.