An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A cache coherence scheme with fast selective invalidation
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Efficient synchronization primitives for large-scale cache-coherent multiprocessors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Analysis of cache invalidation patterns in multiprocessors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Cache considerations for multiprocessor programmers
Communications of the ACM
Trap architectures for Lisp systems
LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
Paradigm: A Highly Scalable Shared-Memory Multicomputer Architecture
Computer - Special issue on cryptography
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Comparative evaluation of latency reducing and tolerating techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparison of hardware and software cache coherence schemes
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The Stanford Dash Multiprocessor
Computer
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Cooperative shared memory: software and hardware for scalable multiprocessor
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Mechanisms for cooperative shared memory
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Implementing a cache consistency protocol
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
A virtual machine emulator for performance evaluation
Communications of the ACM
Verifying a Multiprocessor Cache Controller Using Random Test Generation
IEEE Design & Test
The DASH Prototype: Logic Overhead and Performance
IEEE Transactions on Parallel and Distributed Systems
An economical solution to the cache coherence problem
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
CACHE COHERENCE PROTOCOLS FOR LARGE-SCALE MULTIPROCESSORS
CACHE COHERENCE PROTOCOLS FOR LARGE-SCALE MULTIPROCESSORS
Compiling for shared-memory and message-passing computers
ACM Letters on Programming Languages and Systems (LOPLAS)
Cost/performance of a parallel computer simulator
PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Where is time spent in message-passing and shared-memory programs?
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A comprehensive bibliography of distributed shared memory
ACM SIGOPS Operating Systems Review
Efficient strategies for software-only protocols in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Effective distributed scheduling of parallel workloads
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Modeling cost/performance of a parallel computer simulator
ACM Transactions on Modeling and Computer Simulation (TOMACS)
High-performance sorting on networks of workstations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Retrospective: tempest and typhoon: user-level shared memory
25 years of the international symposia on Computer architecture (selected papers)
Tempest and typhoon: user-level shared memory
25 years of the international symposia on Computer architecture (selected papers)
Performance of database workloads on shared-memory systems with out-of-order processors
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Tolerating late memory traps in ILP processors
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Memory sharing predictor: the key to a speculative coherent DSM
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Selective, accurate, and timely self-invalidation using last-touch prediction
Proceedings of the 27th annual international symposium on Computer architecture
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems
ACM Transactions on Computer Systems (TOCS)
Application-specific protocols for user-level shared memory
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Distributed Shared Memory: Concepts and Systems
IEEE Parallel & Distributed Technology: Systems & Technology
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Exploiting high-level coherence information to optimize distributed shared state
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Inferential queueing and speculative push for reducing critical communication latencies
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Using cache optimizing compiler for managing software cache on distributed shared memory system
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Token coherence: decoupling performance and correctness
Proceedings of the 30th annual international symposium on Computer architecture
Tolerating Late Memory Traps in Dynamically Scheduled Processors
IEEE Transactions on Computers
Coherence decoupling: making use of incoherence
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Journal of Systems Architecture: the EUROMICRO Journal
A localizing directory coherence protocol
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Shared memory computing on clusters with symmetric multiprocessors and system area networks
ACM Transactions on Computer Systems (TOCS)
Inferential queueing and speculative push
International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Circulating shared-registers for multiprocessor systems
Journal of Systems Architecture: the EUROMICRO Journal
ACM Transactions on Computer Systems (TOCS)
Cohesion: a hybrid memory model for accelerators
Proceedings of the 37th annual international symposium on Computer architecture
WAYPOINT: scaling coherence to thousand-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Exploiting locality: a flexible DSM approach
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Scaling up the preventive replication of autonomous databases in cluster systems
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Why on-chip cache coherence is here to stay
Communications of the ACM
A new perspective for efficient virtual-cache coherence
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 0.02 |
We believe the paucity of massively parallel, shared-memory machines follows from the lack of a shared-memory programming performance model that can inform programmers of the cost of operations (so they can avoid expensive ones) and can tell hardware designers which cases are common (so they can build simple hardware to optimize them). Cooperative shared memory, our approach to shared-memory design, addresses this problem.Our initial implementation of cooperative shared memory uses a simple programming model, called Check-In/Check-Out (CICO), in conjunction with even simpler hardware, called Dir1SW. In CICO, programs bracket uses of shared data with a check_in directive terminating the expected use of the data. A cooperative prefetch directive helps hide communication latency. Dir1SW is a minimal directory protocol that adds little complexity to message-passing hardware, but efficiently supports programs written within the CICO model.