Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems
IEEE Transactions on Computers
A model for estimating trace-sample miss ratios
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Efficient trace-driven simulation methods for cache performance analysis
ACM Transactions on Computer Systems (TOCS)
Cache inclusion and processor sampling in multiprocessor simulations
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
MemorIES3: a programmable, real-time hardware emulation tool for multiprocessor server design
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches
IEEE Transactions on Computers
Performance of shared cache for parallel-pipelined computer systems
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Analysis of cache replacement-algorithms
Analysis of cache replacement-algorithms
Hi-index | 0.00 |
In this paper we evaluate two shared-cache architectures for small-scale multiprocessors. We vary shared cache sizes from 8MB to 1GB, under various block sizes, cache organizations and sizes, and strategies for IO transactions. We use 12 bus trace samples obtained during the execution of a 100GB TPC-H on an eight-way multiprocessor.To deal with the cold-start misses at the beginning of each sample, we identify the sure misses which are known to be misses in the full trace. The difference between the total number of misses and the number of sure misses is the zone of uncertainly, which may be hits or misses in the full trace. It turns out that the zone of uncertainty is small enough in most cases that useful conclusions can be drawn.Our conclusions are that a single-cluster configuration with a shared cache--even a very small one--can be very effective for TPC-H. We also show that the coherence traffic between shared caches in a multiple cluster system is very high in the context of TPC-H.