Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
Computer
ATUM: a new technique for capturing address traces using microcode
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Multiprocessor cache synchronization: issues, innovations, evolution
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Cache memory optimization to reduce processor/memory traffic
Advances in VLSI and Computer Systems
Firefly: a multiprocessor workstation
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Logic verification algorithms and their parallel implementation
DAC '87 Proceedings of the 24th ACM/IEEE Design Automation Conference
Memory-reference characteristics of multiprocessor applications under MACH
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
SIGMETRICS '86/PERFORMANCE '86 Proceedings of the 1986 ACM SIGMETRICS joint international conference on Computer performance modelling, measurement and evaluation
Issues related to MIMD shared-memory computers: the NYU ultracomputer approach
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Implementing a cache consistency protocol
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Program Behavior: Models and Measurements
Program Behavior: Models and Measurements
Using cache memory to reduce processor-memory traffic
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Analysis of multiprocessor cache organizations with alternative main memory update policies
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Dynamic decentralized cache schemes for mimd parallel processors
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
A low-overhead coherence solution for multiprocessors with private cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
An economical solution to the cache coherence problem
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
SIMON: a Simulator of Multicomputer Networks
SIMON: a Simulator of Multicomputer Networks
Simulation analysis of data-sharing in shared memory multiprocessors
Simulation analysis of data-sharing in shared memory multiprocessors
A cache coherence approach for large multiprocessor systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Evaluating the performance of software cache coherence
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
The effect of sharing on the cache and bus performance of parallel programs
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Evaluating the performance of four snooping cache coherency protocols
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Design and performance of a coherent cache for parallel logic programming architectures
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Introducing memory into the switch elements of multiprocessor interconnection networks
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Multiple vs. wide shared bus multiprocessors
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
C2MP: a cache-coherent, distributed memory multiprocessor-system
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Cache considerations for multiprocessor programmers
Communications of the ACM
Recoverable Distributed Shared Virtual Memory
IEEE Transactions on Computers
Analysis of multithreaded architectures for parallel computing
SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Munin: distributed shared memory based on type-specific memory coherence
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Address Tracing for Parallel Machines
Computer - Special issue on experimental research in computer architecture
Architectural primitives for a scalable shared memory multiprocessor
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Comparison of hardware and software cache coherence schemes
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Modeling the performance of limited pointers directories for cache coherence
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Experimental comparison of memory management policies for NUMA multiprocessors
ACM Transactions on Computer Systems (TOCS)
Implementation and performance of Munin
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
A bibliography of parallel debuggers, 1990 edition
ACM SIGPLAN Notices
Simplicity Versus Accuracy in a Model of Cache Coherency Overhead
IEEE Transactions on Computers
Comparison and analysis of software and directory coherence schemes
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Evaluating Design Choices for Shared Bus Multiprocessors in a Throughput-Oriented Environment
IEEE Transactions on Computers
Adjustable block size coherent caches
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Cache Invalidation Patterns in Shared-Memory Multiprocessors
IEEE Transactions on Computers
Design choices for the TOP-1 multiprocessor workstation
IBM Journal of Research and Development
A performance evaluation of optimal hybrid cache coherency protocols
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons
ACM Computing Surveys (CSUR)
The accuracy of trace-driven simulations of multiprocessors
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Compiling for shared-memory and message-passing computers
ACM Letters on Programming Languages and Systems (LOPLAS)
An evaluation of a compiler optimization for improving the performance of a coherence directory
ICS '94 Proceedings of the 8th international conference on Supercomputing
Data replication for mobile computers
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Techniques for reducing consistency-related communication in distributed shared-memory systems
ACM Transactions on Computer Systems (TOCS)
An analytic study of dynamic hardware and software cache coherence strategies
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Boosting the performance of hybrid snooping cache protocols
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A Subsystem-Oriented Performance Analysis Methodology for Shared-Bus Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
The interaction of parallel programming constructs and coherence protocols
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Tradeoffs between false sharing and aggregation in software distributed shared memory
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Analytical Prediction of Performance for Cache Coherence Protocols
IEEE Transactions on Computers
Minimization of Communication Cost Through Caching in Mobile Environments
IEEE Transactions on Parallel and Distributed Systems
A study of three dynamic approaches to handle widely shared data in shared-memory multiprocessors
ICS '98 Proceedings of the 12th international conference on Supercomputing
Evaluating the Effect of Coherence Protocols on the Performance of Parallel Programming Constructs
International Journal of Parallel Programming
IEEE Transactions on Computers
IEEE Transactions on Parallel and Distributed Systems
Synchronization with multiprocessor caches
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Adaptive software cache management for distributed shared memory architectures
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
An empirical evaluation of two memory-efficient directory methods
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The effect of seance communication on multiprocessing systems
ACM Transactions on Computer Systems (TOCS)
WWW Traffic Reduction and Load Balancing through Server-Based Caching
IEEE Parallel & Distributed Technology: Systems & Technology
Trace Factory: Generating Workloads for Trace-Driven Simulation of Shared-Bus Multiprocessors
IEEE Parallel & Distributed Technology: Systems & Technology
Error Recovery in Shared Memory Multiprocessors Using Private Caches
IEEE Transactions on Parallel and Distributed Systems
Accuracy of Memory Reference Traces of Parallel Computations in Trace-Drive Simulation
IEEE Transactions on Parallel and Distributed Systems
An Analysis of Cache Performance for a Hypercube Multicomputer
IEEE Transactions on Parallel and Distributed Systems
Improving Memory Utilization in Cache Coherence Directories
IEEE Transactions on Parallel and Distributed Systems
Eliminating Stale Data References through Array Data-Flow Analysis
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Minerva: An Adaptive Subblock Coherence Protocol for Improved SMP Performance
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Two Adaptive Hybrid Cache Coherency Protocols
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Distance-Adaptive Update Protocols for Scalable Shared-Memory Multiprocessors
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A Cache Coherency Protocol for Optically Connected Parallel Computer Systems
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
An effective full-map directory scheme for the sectored caches
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Evaluation of cache consistency algorithm performance
MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Analysis of Shared Memory Misses and Reference Patterns
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Journal of Systems Architecture: the EUROMICRO Journal
Reducing the Write Traffic for a Hybrid Cache Protocol
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Speeding-up multiprocessors running DBMS workloads through coherence protocols
International Journal of High Performance Computing and Networking
ACM SIGARCH Computer Architecture News
Towards multi-level adaptation for distributed operating systems and applications
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Hi-index | 0.03 |
In this paper we use trace-driven simulation to analyze the memory reference patterns of write shared data in several parallel applications. We first develop a characterization of write sharing (based on the notion of a write run), and then examine the traces, using metrics derived from the characterization. The results indicate that the amount of write sharing in all programs is small; and that it is characterized by short to medium sequences of per processor references, with little contention for either data or locks.We determine to what extent this analysis can be used to predict the coherency overhead of write-invalidate and write-broadcast protocols. We develop a simple model of write sharing from the write run characterization. By applying the results of the sharing analysis to the model, weighted by machine-specific cycle costs for carrying out coherency-related bus operations, we can estimate relative protocol performance. We compare these results to those from detailed architectural simulations.The simulation results indicate that (1) neither protocol dominates in performance; and that (2) the write run model is a good predictor of protocol performance when the unit of the coherency operations matches that in the sharing analysis. This is the case for the write-broadcast protocols, in which one word is broadcast for each write to shared data. However, in Berkeley Ownership, a write-invalidate protocol, the unit of coherency is an entire cache block. When the block size is large, performance for this protocol is quite sensitive to the memory reference patterns within the block.