Optimizing OpenMP parallelized DGEMM calls on SGI altix 3700

Authors:
Daniel Hackenberg;Robert Schöne;Wolfgang E. Nagel;Stefan Pflüger
Affiliations:
Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, Germany;Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, Germany;Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, Germany;Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, Germany
Venue:
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Year:
2006

Citing 1
Cited 3

Performance Analysis with BenchIT: Portable, Flexible, Easy to Use

QEST '04 Proceedings of the The Quantitative Evaluation of Systems, First International Conference

OpenMP parallelism for fluid and fluid-particulate systems

Parallel Computing
Cache-sensitive MapReduce DGEMM algorithms for shared memory architectures

Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference
Fast parallel algorithms for blocked dense matrix multiplication on shared memory architectures

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Using functions of parallelized mathematical libraries is a common way to accelerate numerical applications. Computer architectures with shared memory characteristics support different approaches for the implementation of such libraries, usually OpenMP or MPI. This paper's content is based on the performance comparison of DGEMM calls (floating point matrix multiplication, double precision) with different OpenMP parallelized numerical libraries, namely Intel MKL and SGI SCSL, and how they can be optimized. Additionally, we have a look at the memory placement policy and give hints for initializing data. Our attention has been focused on a SGI Altix 3700 Bx2 system using BenchIT [1] as a very convenient performance measurement suite for the examinations.