LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A tightly-coupled processor-network interface
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Closing the window of vulnerability in multiphase memory transactions
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Cooperative shared memory: software and hardware for scalable multiprocessors
ACM Transactions on Computer Systems (TOCS)
Adaptive cache coherency for detecting migratory shared data
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Software-extended coherent shared memory: performance and cost
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Integration of message passing and shared memory in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The performance impact of flexibility in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The DASH Prototype: Logic Overhead and Performance
IEEE Transactions on Parallel and Distributed Systems
A comprehensive bibliography of distributed shared memory
ACM SIGOPS Operating Systems Review
Decoupled hardware support for distributed shared memory
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Tolerating late memory traps in ILP processors
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Hardware Versus Software Implementation of COMA
ICPP '97 Proceedings of the international Conference on Parallel Processing
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Tolerating Late Memory Traps in Dynamically Scheduled Processors
IEEE Transactions on Computers
SMTp: An Architecture for Next-generation Scalable Multi-threading
Proceedings of the 31st annual international symposium on Computer architecture
Journal of Systems Architecture: the EUROMICRO Journal
A comparative evaluation of hybrid distributed shared-memory systems
Journal of Systems Architecture: the EUROMICRO Journal
Exploiting locality: a flexible DSM approach
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
The cost, complexity, and inflexibility of hardware-based directory protocols motivate us to study the performance implications of protocols that emulate directory management using software handlers executed on the compute processors. An important performance limitation of such software-only protocols is that software latency associated with directory management ends up on the critical memory access path for read miss transactions. We propose five strategies that support efficient data transfers in hardware whereas directory management is handled at a slower pace in the background by software handlers. Simulations show that this approach can remove the directory-management latency from the memory access path. Whereas the directory is managed in software, the hardware mechanisms must access the memory state in order to enable data transfers at a high speed. Overall, our strategies reach between 60% and 86% of the hardware-based protocol performance.