Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
Firefly: a multiprocessor workstation
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
The effect of sharing on the cache and bus performance of parallel programs
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
The Stanford Dash Multiprocessor
Computer
Radiosity and realistic image synthesis
Radiosity and realistic image synthesis
Implementing a cache consistency protocol
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Radiosity and Relaxation Methods
IEEE Computer Graphics and Applications
IEEE Transactions on Parallel and Distributed Systems
Cooperative Caching for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Memory subsystem characterization in a 16-core snoop-based chip-multiprocessor architecture
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Micro-architectural support for metadata coherence in multi-core dynamic information flow tracking
Proceedings of the 2nd International Workshop on Hardware and Architectural Support for Security and Privacy
Hi-index | 0.00 |
A new cache coherence solution is proposed for an over 500MHz on-chip multiprocessor using advanced VLSI technology. In order to reduce shared-bus transaction time, the central coherence unit (CCU) is introduced. The CCU controls all shared-bus transactions, monitoring all cache tags every clock cycle, and executes a bus transaction in four clock cycles while a conventional bus mechanism requires eight clock cycles. A new cache coherence protocol (CRAC) is also introduced in order to reduce external memory access. The CRAC protocol makes it possible to load a desired data from any cache having a copy, and to transfer write-back responsibility to another cache having a copy. An implementation of CCU and CRAC is presented and evaluated using a cycle-based multiprocessor simulator. Simulation results show that introduction of CCU and CRAC is effective to reduce shared-bus traffic and total execution time. Furthermore, proposed multiprocessor model with CCU and CRAC is proved to be more scalable than a conventional multiprocessor model.