An extended set of FORTRAN basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
A Proposal for a Set of Parallel Basic Linear Algebra Subprograms
PARA '95 Proceedings of the Second International Workshop on Applied Parallel Computing, Computations in Physics, Chemistry and Engineering Science
UPC performance and potential: a NPB experimental study
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A performance analysis of the Berkeley UPC compiler
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Shared memory programming for large scale machines
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
pMatlab Parallel Matlab Library
International Journal of High Performance Computing Applications
Multi-threading and one-sided communication in parallel LU factorization
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Toward the parallelization of GSL
The Journal of Supercomputing
Interfaces for parallel numerical linear algebra libraries in high level languages
Advances in Engineering Software
Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A Parallel Numerical Library for UPC
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Evaluation of UPC programmability using classroom studies
Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
Optimizing bandwidth limited problems using one-sided communication and overlap
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A programming model performance study using the NAS parallel benchmarks
Scientific Programming - Exploring Languages for Expressing Medium to Massive On-Chip Parallelism
Optimizing UPC programs for multi-core systems
Scientific Programming - Exploring Languages for Expressing Medium to Massive On-Chip Parallelism
A parallel numerical library for co-array fortran
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Computers and Electrical Engineering
Performance evaluation of sparse matrix products in UPC
The Journal of Supercomputing
The Servet 3.0 benchmark suite: Characterization of network performance degradation
Computers and Electrical Engineering
Hi-index | 0.00 |
The popularity of Partitioned Global Address Space (PGAS) languages has increased during the last years thanks to their high programmability and performance through an efficient exploitation of data locality, especially on hierarchical architectures such as multicore clusters. This paper describes UPCBLAS, a parallel numerical library for dense matrix computations using the PGAS Unified Parallel C language. The routines developed in UPCBLAS are built on top of sequential basic linear algebra subprograms functions and exploit the particularities of the PGAS paradigm, taking into account data locality in order to achieve a good performance. Furthermore, the routines implement other optimization techniques, several of them by automatically taking into account the hardware characteristics of the underlying systems on which they are executed. The library has been experimentally evaluated on a multicore supercomputer and compared with a message-passing-based parallel numerical library, demonstrating good scalability and efficiency. Copyright © 2012 John Wiley & Sons, Ltd.