Automatically Tuned Linear Algebra Software

Authors:
R. C J.Dongarra
Affiliations:
-
Venue:
Automatically Tuned Linear Algebra Software
Year:
1997

Citing 0
Cited 36

Algorithmic Redistribution Methods for Block-Cyclic Decompositions

IEEE Transactions on Parallel and Distributed Systems
A framework for symmetric band reduction

ACM Transactions on Mathematical Software (TOMS)
98¢/Mflops/s ultra-large-scale neural-network training on a pIII cluster

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
High-cost CFD on a low-cost cluster

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Optimizing locality for ODE solvers

ICS '01 Proceedings of the 15th international conference on Supercomputing
A recursive formulation of Cholesky factorization of a matrix in packed storage

ACM Transactions on Mathematical Software (TOMS)
Pipelining for Locality Improvement in RK Methods

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
The Matrix Template Library: A Generic Programming Approach to High Performance Numerical Linear Algebra

ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
HPF and Numerical Libraries

ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
Blocking Techniques in Numerical Software

ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
Better tiling and array contraction for compiling scientific programs

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A comparison of empirical and model-driven optimization

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Estimating cache misses and locality using stack distances

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
The design and implementation of a new out-of-core sparse cholesky factorization method

ACM Transactions on Mathematical Software (TOMS)
A fast Fourier transform compiler

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Parallel and fully recursive multifrontal sparse Cholesky

Future Generation Computer Systems - Special issue: Selected numerical algorithms
Multilevel hierarchical matrix multiplication on clusters

Proceedings of the 18th annual international conference on Supercomputing
Rating Compiler Optimizations for Automatic Performance Tuning

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Performance and environment monitoring for continuous program optimization

IBM Journal of Research and Development
OpenMP issues arising in the development of parallel BLAS and LAPACK libraries

Scientific Programming - OpenMP
Improving locality for ODE solvers by program transformations

Scientific Programming
Optimizing code through iterative specialization

Proceedings of the 2008 ACM symposium on Applied computing
Combining building blocks for parallel multi-level matrix multiplication

Parallel Computing
Automatic analysis for managing and optimizing performance-code quality

Proceedings of the 2008 workshop on Static analysis
High-performance technical computing with erlang

Proceedings of the 7th ACM SIGPLAN workshop on ERLANG
Achieving accurate and context-sensitive timing for code optimization

Software—Practice & Experience
Optimization of a Computational Fluid Dynamics Code for the Memory Hierarchy: A Case Study

International Journal of High Performance Computing Applications
Dynamic selection of implementation variants of sequential iterated runge-kutta methods with tile size sampling

Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
autopin: automated optimization of thread-to-core pinning on multicore systems

Transactions on high-performance embedded architectures and compilers III
Smart data structures: an online machine learning approach to multicore data structures

Proceedings of the 8th ACM international conference on Autonomic computing
An efficient time-step-based self-adaptive algorithm for predictor-corrector methods of Runge-Kutta type

Journal of Computational and Applied Mathematics
Manipulating MAXLIVE for spill-free register allocation

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
A data locality methodology for matrix---matrix multiplication algorithm

The Journal of Supercomputing
Automatic tuning of PDGEMM towards optimal performance

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Implementing a GPU programming model on a Non-GPU accelerator architecture

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Profiling of task-based applications on shared memory machines: scalability and bottlenecks

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Automatically Tuned Linear Algebra Software

Quantified Score

Visualization

Abstract