Memory Performance And SPEC OpenMP scalability on quad-socket x86 64 systems

Authors:
Daniel Molka;Robert Schöne;Daniel Hackenberg;Matthias S. Müller
Affiliations:
Center for Information Services and High Performance Computing, Technische Universität Dresden, Dresden, Germany;Center for Information Services and High Performance Computing, Technische Universität Dresden, Dresden, Germany;Center for Information Services and High Performance Computing, Technische Universität Dresden, Dresden, Germany;Center for Information Services and High Performance Computing, Technische Universität Dresden, Dresden, Germany
Venue:
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
Year:
2011

Citing 9
Cited 3

Large system performance of SPEC OMP benchmark suites

International Journal of Parallel Programming - Special issue: OpenMP: Experiences and implementations
Quantitative performance analysis of the SPEC OMPM2001 benchmarks

Scientific Programming - OpenMP
SPEC HPG benchmarks for high-performance systems

International Journal of High Performance Computing and Networking
Scalability Analysis of the SPEC OpenMP Benchmarks on Large-Scale Shared Memory Multiprocessors

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part II
Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor

IEEE Micro
Intel® QuickPath Interconnect Architectural Features Supporting Scalable System Architectures

HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments

ICPPW '10 Proceedings of the 2010 39th International Conference on Parallel Processing Workshops

Task-parallel implementation of 3D shortest path raytracing for geophysical applications

Computers & Geosciences
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Everything you always wanted to know about synchronization but were afraid to ask

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

Quantified Score

Hi-index	0.00

Visualization

Abstract

Because of the continuous trend towards higher core counts, parallelization is mandatory for many application domains beyond the traditional HPC sector. Current commodity servers comprise up to 48 processor cores in configurations with only four sockets. Those shared memory systems have distinct NUMA characteristics. The exact location of data within the memory system significantly affects both access latency and bandwidth. Therefore, NUMA aware memory allocation and scheduling are highly performance relevant issues. In this paper we use low-level microbenchmarks to compare two state-of-the-art quad-socket systems with x86 64 processors from AMD and Intel. We then investigate the performance of the application based OpenMP benchmark suite SPEC OMPM2001. Our analysis shows how these benchmarks scale on shared memory systems with up to 48 cores and how scalability correlates with the previously determined characteristics of the memory hierarchy. Furthermore, we demonstrate how the processor interconnects influence the benchmark results.