Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Exploring configurations of functional units in an out-of-order superscalar processor
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Starfire: Extending the SMP Envelope
IEEE Micro
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
A low-overhead coherence solution for multiprocessors with private cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
A two-level directory organization solution for CC-NUMA systems
ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Hi-index | 0.00 |
Maintaining the coherence is becoming one of the most serious problems faced when designing today's machines. Initially, this problem was relatively simple when the interconnection network of Symmetric MultiProcessors (SMP) was an atomic bus, which simplified the implementation of invalidation coherence protocols. However, due to the increasing bandwidth demand, atomic busses have been progressively replaced by split busses that uncoupled the request and response phases of a transaction. Split busses enable initiating new requests before receiving the response to those already in progress but make more complicated the preservation of the coherence. Indeed, a new request induces a conflict when it concerns a block address involved by another current request and when one of the requests is a WRITE miss. Several solutions exist to solve this problem. That one used in the SGI machine is based on a shared data bus which traces the completion of transactions. Unfortunately, it becomes impracticable in the recent machines which replace data busses by more efficient networks (again for bandwidth constrains), ultimately by a crossbar. This work describes and quantitatively evaluates two possible solutions to the coherence problem for the new architectures where all the data responses cannot be traced by each processor.