Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
Approximation algorithms for scheduling unrelated parallel machines
Mathematical Programming: Series A and B
Selected papers of the second workshop on Languages and compilers for parallel computing
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The communication challenge for MPP: Intel Paragon and Meiko CS-2
Parallel Computing
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Fault-tolerant matrix operations for networks of workstations using diskless checkpointing
Journal of Parallel and Distributed Computing
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
ScaLAPACK user's guide
On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems
Theoretical Computer Science
MagPIe: MPI's collective communication operations for clustered wide area systems
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
Algorithm 539: Basic Linear Algebra Subprograms for Fortran Usage [F1]
ACM Transactions on Mathematical Software (TOMS)
Automatically tuned collective communications
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
HARNESS and fault tolerant MPI
Parallel Computing - Clusters and computational grids for scientific computing
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Building a high-performance collective communication library
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Assessing Fast Network Interfaces
IEEE Micro
Fast Measurement of LogP Parameters for Message Passing Platforms
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Strong Lower Bounds on the Approximability of some NPO PB-Complete Maximization Problems
MFCS '95 Proceedings of the 20th International Symposium on Mathematical Foundations of Computer Science
Approximation Algorithms for Dynamic Storage Allocations
ESA '96 Proceedings of the Fourth Annual European Symposium on Algorithms
A bandwidth latency tradeoff for broadcast and reduction
Information Processing Letters
A comparison of empirical and model-driven optimization
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
An Evaluation of Current High-Performance Networks
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Automatic Blocking of Nested Loops
Automatic Blocking of Nested Loops
Beowulf Cluster Computing with Linux
Beowulf Cluster Computing with Linux
Efficient implementation of reduce-scatter in MPI
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
Performance Analysis of MPI Collective Operations
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
Automatic blocking of QR and LU factorizations for locality
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Towards an Accurate Model for Collective Communications
International Journal of High Performance Computing Applications
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines
Scientific Programming
Classification and utilization of abstractions for optimization
ISoLA'04 Proceedings of the First international conference on Leveraging Applications of Formal Methods
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Performance Model for Parallel Mathematical Libraries Based on Historical Knowledgebase
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Dynamic Load Balancing on Dedicated Heterogeneous Systems
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A Standard and Software for Numerical Metadata
ACM Transactions on Mathematical Software (TOMS)
A Note on Auto-tuning GEMM for GPUs
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Performance modeling and optimal block size selection for the small-bulge multishift QR algorithm
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Dynamic load balancing on heterogeneous multi-GPU systems
Computers and Electrical Engineering
Hi-index | 0.00 |
The challenge for the development of next-generation software is the successful management of the complex computational environment while delivering to the scientist the full power of flexible compositions of the available algorithmic alternatives. Self-adapting numerical software (SANS) systems are intended to meet this significant challenge. The process of arriving at an efficient numerical solution of problems in computational science involves numerous decisions by a numerical expert. Attempts to automate such decisions distinguish three levels: algorithmic decision, management of the parallel environment, and processor-specific tuning of kernels. Additionally, at any of these levels we can decide to rearrange the user's data. In this paper we look at a number of efforts at the University of Tennessee to investigate these areas.