MPI Reduction Operations for Sparse Floating-point Data

Authors:
Michael Hofmann;Gudula Rünger
Affiliations:
Department of Computer Science, Chemnitz University of Technology, Germany;Department of Computer Science, Chemnitz University of Technology, Germany
Venue:
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Year:
2008

Citing 7
Cited 1

New Techniques for Collective Communications in Clusters: A Case Study with MPI

ICPP '02 Proceedings of the 2001 International Conference on Parallel Processing
Pipelining and Overlapping for MPI Collective Operations

LCN '03 Proceedings of the 28th Annual IEEE International Conference on Local Computer Networks
Runtime Compression of MPI Messanes to Improve the Performance and Scalability of Parallel Applications

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Optimization of MPI collective communication on BlueGene/L systems

Proceedings of the 19th annual international conference on Supercomputing
Fast Lossless Compression of Scientific Floating-Point Data

DCC '06 Proceedings of the Data Compression Conference
STAR-MPI: self tuned adaptive routines for MPI collective operations

Proceedings of the 20th annual international conference on Supercomputing
MPI collective algorithm selection and quadtree encoding

Parallel Computing

Transparent neutral element elimination in MPI reduction operations

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a pipeline algorithm for MPI_Reducethat uses a Run Length Encoding(RLE) scheme to improve the global reduction of sparse floating-point data. The RLE scheme is directly incorporated into the reduction process and causes only low overheads in the worst case. The high throughput of the RLE scheme allows performance improvements when using high performance interconnects, too. Random sample data and sparse vector data from a parallel FEM application is used to demonstrate the performance of the new reduction algorithm for an HPC Cluster with InfiniBand interconnects.