Computer Architecture, Fourth Edition: A Quantitative Approach
Computer Architecture, Fourth Edition: A Quantitative Approach
The AMD Opteron Northbridge Architecture
IEEE Micro
Memory hierarchy performance measurement of commercial dual-core desktop processors
Journal of Systems Architecture: the EUROMICRO Journal
Inter-core prefetching for multicore processors using migrating helper threads
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Memory system performance in a NUMA multicore multiprocessor
Proceedings of the 4th Annual International Conference on Systems and Storage
Proceedings of the international symposium on Memory management
Simultaneous multithreading on x86_64 systems: an energy efficiency evaluation
HotPower '11 Proceedings of the 4th Workshop on Power-Aware Computing and Systems
Memory Performance And SPEC OpenMP scalability on quad-socket x86 64 systems
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
Memory performance at reduced CPU clock speeds: an analysis of current x86_64 processors
HotPower'12 Proceedings of the 2012 USENIX conference on Power-Aware Computing and Systems
Improving Cache Management Policies Using Dynamic Reuse Distances
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Everything you always wanted to know about synchronization but were afraid to ask
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Hi-index | 0.00 |
Across a broad range of applications, multicore technology is the most important factor that drives today's microprocessor performance improvements. Closely coupled is a growing complexity of the memory subsystems with several cache levels that need to be exploited efficiently to gain optimal application performance. Many important implementation details of these memory subsystems are undocumented. We therefore present a set of sophisticated benchmarks for latency and bandwidth measurements to arbitrary locations in the memory subsystem. We consider the coherency state of cache lines to analyze the cache coherency protocols and their performance impact. The potential of our approach is demonstrated with an in-depth comparison of ccNUMA multiprocessor systems with AMD (Shanghai) and Intel (Nehalem-EP) quad-core x86-64 processors that both feature integrated memory controllers and coherent point-to-point interconnects. Using our benchmarks we present fundamental memory performance data and architectural properties of both processors. Our comparison reveals in detail how the microarchitectural differences tremendously affect the performance of the memory subsystem.