Matrix computations (3rd ed.)
ScaLAPACK user's guide
Accuracy and Stability of Numerical Algorithms
Accuracy and Stability of Numerical Algorithms
MPI-The Complete Reference, Volume 1: The MPI Core
MPI-The Complete Reference, Volume 1: The MPI Core
Numerical Linear Algebra for High Performance Computers
Numerical Linear Algebra for High Performance Computers
Iterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems
An estimator for the diagonal of a matrix
Applied Numerical Mathematics
Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems
International Journal of High Performance Computing Applications
Low cost high performance uncertainty quantification
Proceedings of the 2nd Workshop on High Performance Computational Finance
Optimizing task layout on the Blue Gene/L supercomputer
IBM Journal of Research and Development
Special Issue for the Workshop on High Performance Computational Finance
Concurrency and Computation: Practice & Experience
Hi-index | 0.00 |
The analysis of a huge backload of ever-accumulating data presents a huge challenge in all respects of computing. Inverse covariance matrices in this respect are very important. We target data uncertainty quantification, a very useful measure of which is provided by inverse covariance matrix diagonal entries. In previous work, we introduced a novel method that reduces overall complexity by at least two orders of magnitude. At the same time, a state-of-the-art message-passing interface (MPI) implementation allowed us to reach a sustained performance of up to 73% (730 TFLOPS on the full 72 Blue Gene/P rack configuration at Jülich). Thanks to its reduced complexity, this work has attracted significant interest, and thus, we have received numerous requests concerning its exploitation in various fields. A common denominator in these requests is that they almost all came from people with no or, in the best case, limited high-performance computing background. Nevertheless, all interest is in analyzing huge data sets, suitably adapting the method to particular applications. A bottleneck then is that potential users are reluctant to pay for a steep learning curve to get proficient in parallel computing using the de facto standard: MPI. Thus, we turned to the Partitioned Global Address Space programming model and in particular the Unified Parallel C language. In this work, we gave a comprehensive description of the framework and demonstrated the efficiency of the state-of-the-art MPI implementation. In addition, we showed that one can develop an easy-to-follow yet efficient Unified Parallel C implementation, which is also easy to debug and maintain, features that significantly boost overall productivity. Copyright © 2011 John Wiley & Sons, Ltd.