Self-adapting numerical software (SANS) effort

Authors:
J. Dongarra;G. Bosilca;Z. Chen;V. Eijkhout;G. E. Fagg;E. Fuentes;J. Langou;P. Luszczek;J. Pjesivac-Grbovic;K. Seymour;H. You;S. S. Vadhiyar
Affiliations:
-;-;-;-;-;-;-;-;-;-;-;-
Venue:
IBM Journal of Research and Development
Year:
2006

Citing 36
Cited 6

Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
A polynomial approximation scheme for scheduling on uniform processors: Using the dual approximation approach

SIAM Journal on Computing
Approximation algorithms for scheduling unrelated parallel machines

Mathematical Programming: Series A and B
A theory of loop permutations

Selected papers of the second workshop on Languages and compilers for parallel computing
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The communication challenge for MPP: Intel Paragon and Meiko CS-2

Parallel Computing
LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Fault-tolerant matrix operations for networks of workstations using diskless checkpointing

Journal of Parallel and Distributed Computing
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
ScaLAPACK user's guide

ScaLAPACK user's guide
On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems

Theoretical Computer Science
MagPIe: MPI's collective communication operations for clustered wide area systems

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
Algorithm 539: Basic Linear Algebra Subprograms for Fortran Usage [F1]

ACM Transactions on Mathematical Software (TOMS)
Automatically tuned collective communications

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
HARNESS and fault tolerant MPI

Parallel Computing - Clusters and computational grids for scientific computing
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Building a high-performance collective communication library

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Assessing Fast Network Interfaces

IEEE Micro
Fast Measurement of LogP Parameters for Message Passing Platforms

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Strong Lower Bounds on the Approximability of some NPO PB-Complete Maximization Problems

MFCS '95 Proceedings of the 20th International Symposium on Mathematical Foundations of Computer Science
Approximation Algorithms for Dynamic Storage Allocations

ESA '96 Proceedings of the Fourth Annual European Symposium on Algorithms
A bandwidth latency tradeoff for broadcast and reduction

Information Processing Letters
A comparison of empirical and model-driven optimization

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
An Evaluation of Current High-Performance Networks

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Automatic Blocking of Nested Loops

Automatic Blocking of Nested Loops
Beowulf Cluster Computing with Linux

Beowulf Cluster Computing with Linux
Efficient implementation of reduce-scatter in MPI

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
Performance Analysis of MPI Collective Operations

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
Automatic blocking of QR and LU factorizations for locality

MSP '04 Proceedings of the 2004 workshop on Memory system performance
Towards an Accurate Model for Collective Communications

International Journal of High Performance Computing Applications
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines

Scientific Programming
Classification and utilization of abstractions for optimization

ISoLA'04 Proceedings of the First international conference on Leveraging Applications of Formal Methods
Applying loop optimizations to object-oriented abstractions through general classification of array semantics

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing

Performance Model for Parallel Mathematical Libraries Based on Historical Knowledgebase

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Dynamic Load Balancing on Dedicated Heterogeneous Systems

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A Standard and Software for Numerical Metadata

ACM Transactions on Mathematical Software (TOMS)
A Note on Auto-tuning GEMM for GPUs

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Performance modeling and optimal block size selection for the small-bulge multishift QR algorithm

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Dynamic load balancing on heterogeneous multi-GPU systems

Computers and Electrical Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The challenge for the development of next-generation software is the successful management of the complex computational environment while delivering to the scientist the full power of flexible compositions of the available algorithmic alternatives. Self-adapting numerical software (SANS) systems are intended to meet this significant challenge. The process of arriving at an efficient numerical solution of problems in computational science involves numerous decisions by a numerical expert. Attempts to automate such decisions distinguish three levels: algorithmic decision, management of the parallel environment, and processor-specific tuning of kernels. Additionally, at any of these levels we can decide to rearrange the user's data. In this paper we look at a number of efforts at the University of Tennessee to investigate these areas.