A class of compatible cache consistency protocols and their support by the IEEE futurebus
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Performance analysis of transaction processing systems
Performance analysis of transaction processing systems
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Scalable Shared-Memory Multiprocessing
Scalable Shared-Memory Multiprocessing
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
A low-overhead coherence solution for multiprocessors with private cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Microbenchmarking and Performance Prediction for Parallel
Microbenchmarking and Performance Prediction for Parallel
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
Proceedings of the 25th annual international symposium on Computer architecture
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Performance experiences on Sun's Wildfire prototype
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A case for user-level dynamic page migration
Proceedings of the 14th international conference on Supercomputing
Architecture and design of AlphaServer GS320
ACM SIGPLAN Notices
Timestamp snooping: an approach for extending SMPs
ACM SIGPLAN Notices
FLASH vs. (simulated) FLASH: closing the simulation loop
ACM SIGPLAN Notices
The trade-off between implicit and explicit data distribution in shared-memory programming paradigms
ICS '01 Proceedings of the 15th international conference on Supercomputing
Architecture and design of AlphaServer GS320
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Timestamp snooping: an approach for extending SMPs
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
FLASH vs. (Simulated) FLASH: closing the simulation loop
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Performance prediction for random write reductions: a case study in modeling shared memory programs
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models
International Journal of Parallel Programming
System Optimization for OLTP Workloads
IEEE Micro
Quantifying and Resolving Remote Memory Access Contention on Hardware DSM Multiprocessors
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Study of Implicit Data Distribution Methods for OpenMP Using the SPEC Benchmarks
WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
Quantifying contention and balancing memory load on hardware DSM multiprocessors
Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Cross-architecture performance predictions for scientific applications using parameterized models
Proceedings of the joint international conference on Measurement and modeling of computer systems
A Framework for Measuring Supercomputer Productivity
International Journal of High Performance Computing Applications
A methodology for detailed performance modeling of reduction computations on SMP machines
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
International Journal of Parallel Programming
A multithreaded PowerPC processor for commercial servers
IBM Journal of Research and Development
A mathematical model for the transitional region between cache hierarchy levels
IICS'04 Proceedings of the 4th international conference on Innovative Internet Community Systems
Hi-index | 0.00 |
Even with today's large caches, the increasing performance gap between processors and memory systems imposes a memory bottleneck for many important scientific and commercial applications. This bottleneck is intensified in shared-memory multiprocessors by contention and the effects of cache coherency. Under heavy memory contention, the memory latency may increase 2 or 3 times. Nonethless, as more sophisticated techniques are used to hide latency and increase bandwidth, measuring memory performance has become increasingly difficult. Previous simple methods to measure memory performance can overestimate uniprocessor memory latency and underestimate bandwidth by tens of percent. This paper introduces a micro benchmark suite that measures memory hierarchy performance in light of both uniprocessor optimizations and the contention and coherence effects of multiprocessors. The benchmark suite has been used to improve the memory system performance of the SGI Origin multiprocessor.