Algorithms for SMP-Clusters Dense Matrix-Vector Multiplication

Authors:
Martin Schmollinger;Michael Kaufmann
Affiliations:
-;-
Venue:
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Year:
2002

Citing 7
Cited 0

A bridging model for parallel computation

Communications of the ACM
An introduction to parallel algorithms

An introduction to parallel algorithms
Can shared-memory model serve as a bridging model for parallel computation?

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
SIMPLE: a methodology for programming high performance algorithms on clusters of symmetric multiprocessors (SMPs)

SIMPLE: a methodology for programming high performance algorithms on clusters of symmetric multiprocessors (SMPs)
kappa NUMA: A Model for Clusters of SMP-Machines

PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
Object-Oriented Message-Passing with TPO++ (Research Note)

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
MPI: A Message-Passing Interface Standard

MPI: A Message-Passing Interface Standard

Quantified Score

Hi-index	0.01

Visualization

Abstract

Clusters of symmetric multiprocessor (SMP) nodes are one of the most important parallel architectures now and in the future. The architecture consists of shared-memory nodes with multiple processors and a fast interconnection network between the nodes. New programming models try to exploit this architecture by using threads in the nodes and using message-passing-libraries for internode communication. In order to develop efficient algorithms it is necessary to consider the hybrid nature of the architecture and of the programming models. In this paper, we present a methodology for designing efficient algorithms for SMP-clusters on top of the 驴NUMA-model. The 驴NUMA-model is a computational model that extends the bulk-synchronous parallel (BSP) model with the characteristics of SMP-clusters. The 驴NUMA-methodology is a top-down method, which suggests to develop an optimal overall algorithm by developing optimal algorithms for each level in the machine hierarchy. We use the problem of dense matrix-vector-multiplication for presentation. The theoretical results of our analysis are verified practically. We show results of experiments, which were made on a Linux-cluster of dual Pentium-III nodes.