Numerical Python for scalable architectures

Authors:
Mads Ruben Burgdorff Kristensen;Brian Vinter
Affiliations:
University of Copenhagen, Denmark;University of Copenhagen, Denmark
Venue:
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Year:
2010

Citing 11
Cited 1

LAPACK: a portable linear algebra library for high-performance computers

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Using MPI: portable parallel programming with the message-passing interface

Using MPI: portable parallel programming with the message-passing interface
Global arrays: a nonuniform memory access programming model for high-performance computers

The Journal of Supercomputing
Fast runtime block cyclic data redistribution on multiprocessors

Journal of Parallel and Distributed Computing
Co-array Fortran for parallel programming

ACM SIGPLAN Fortran Forum
Basic Linear Algebra Subprograms for Fortran Usage

ACM Transactions on Mathematical Software (TOMS)
ScaLAPACK: a portable linear algebra library for distributed memory computers - design issues and performance

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
High Performance Fortran

IEEE Parallel & Distributed Technology: Systems & Technology
Python for Scientific Computing

Computing in Science and Engineering
IPython: A System for Interactive Scientific Computing

Computing in Science and Engineering
GPAW optimized for Blue Gene/P using hybrid programming

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing

Single assignment C (SAC) high productivity meets high performance: high productivity meets high performance

CEFP'11 Proceedings of the 4th Summer School conference on Central European Functional Programming School

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we introduce DistNumPy, a library for doing numerical computation in Python that targets scalable distributed memory architectures. DistNumPy extends the NumPy module[15], which is popular for scientific programming. Replacing NumPy with Dist-NumPy enables the user to write sequential Python programs that seamlessly utilize distributed memory architectures. This feature is obtained by introducing a new backend for NumPy arrays, which distribute data amongst the nodes in a distributed memory multi-processor. All operations on this new array will seek to utilize all available processors. The array itself is distributed between multiple processors in order to support larger arrays than a single node can hold in memory. We perform three experiments of sequential Python programs running on an Ethernet based cluster of SMP-nodes with a total of 64 CPU-cores. The results show an 88% CPU utilization when running a Monte Carlo simulation, 63% CPU utilization on an N-body simulation and a more modest 50% on a Jacobi solver. The primary limitation in CPU utilization is identified as SMP limitations and not the distribution aspect. Based on the experiments we find that it is possible to obtain significant speedup from using our new array-backend without changing the original Python code.