Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
Multiprocessor cache design considerations
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
The rice parallel processing testbed
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Efficient (stack) algorithms for analysis of write-back and sector memories
ACM Transactions on Computer Systems (TOCS)
Analysis of cache invalidation patterns in multiprocessors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
High-performance computer architecture (2nd ed.)
High-performance computer architecture (2nd ed.)
Shared Block Contention in a Cache Coherence Protocol
IEEE Transactions on Computers
On the validity of trace-driven simulation for multiprocessors
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Efficient trace-driven simulation methods for cache performance analysis
ACM Transactions on Computer Systems (TOCS)
Hiding memory latency using dynamic scheduling in shared-memory multiprocessors
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Analysis of Cache Performance for Operating Systems and Multiprogramming
Analysis of Cache Performance for Operating Systems and Multiprogramming
Analysis of cache replacement-algorithms
Analysis of cache replacement-algorithms
Multilevel cache hierarchies
Optimal replacements in caches with two miss costs
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Shared cache architectures for decision support systems
Performance Evaluation
Stack Evaluation of Arbitrary Set-Associative Multiprocessor Caches
IEEE Transactions on Parallel and Distributed Systems
Cost-Sensitive Cache Replacement Algorithms
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Software—Practice & Experience
Hi-index | 0.00 |
The evaluation of cache-based systems demands careful simulations of entire benchmarks. Simulation efficiency is essential to realistic evaluations. For systems with large caches and large number of processors, simulation is often too slow to be practical. In particular, the optimized design of a cache for a multiprocessor is very complex with current techniques.This paper addresses these problems. First we introduce necessary and sufficient conditions for cache inclusion in systems with invalidations. Second, under cache inclusion, we show that an accurate trace for a given processor or for a cluster of processors can be extracted from a multiprocessor trace. With this methodology, possible cache architectures for a processor or for a cluster of processors are evaluated independently of the rest of the system, resulting in a drastic reduction of the trace length and simulation complexity. Moreover, many important system-wide metrics can be estimated with good accuracy by extracting the traces of a set of randomly selected processors, an approach we call processor sampling. We demonstrate the accuracy and efficiency of these techniques by applying them to three 64-processor traces.