Computer
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The DASH prototype: implementation and performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Reactive NUMA: a design for unifying S-COMA and CC-NUMA
Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors
Proceedings of the 25th annual international symposium on Computer architecture
Analytic evaluation of shared-memory systems with ILP processors
Proceedings of the 25th annual international symposium on Computer architecture
Memory sharing predictor: the key to a speculative coherent DSM
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
FLASH vs. (Simulated) FLASH: closing the simulation loop
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
WildFire: A Scalable Path for SMPs
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Memory profiling on shared-memory multiprocessors
Memory profiling on shared-memory multiprocessors
ATLAS: a chip-multiprocessor with transactional memory support
Proceedings of the conference on Design, automation and test in Europe
A New Solution to Coherence Problems in Multicache Systems
IEEE Transactions on Computers
Balancing Programmability and Silicon Efficiency of Heterogeneous Multicore Architectures
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
The shared memory research community has proposed many complex communication protocols that aim to eliminate specific performance bottlenecks, while still providing an easy-to-use communication interface. Although tailored protocols can eliminate some bottlenecks that arise in real applications, removing the cause of the bottleneck through software optimizations and bug fixes is cheaper to implement, faster to fix (once found), and requires no additional support by the hardware beyond a simple shared memory interface. In fact, in our experience, the choice of coherence protocol is much less important than providing an efficient hardware feedback that indentifies the source of the problem. Future cache-coherence research should focus efforts on illuminating memory system behavior, providing smarter tools to identify bottlenecks, and helping to eliminate them through software optimizations.