Algorithmic Redistribution Methods for Block-Cyclic Decompositions
IEEE Transactions on Parallel and Distributed Systems
A framework for symmetric band reduction
ACM Transactions on Mathematical Software (TOMS)
98¢/Mflops/s ultra-large-scale neural-network training on a pIII cluster
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
High-cost CFD on a low-cost cluster
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Optimizing locality for ODE solvers
ICS '01 Proceedings of the 15th international conference on Supercomputing
A recursive formulation of Cholesky factorization of a matrix in packed storage
ACM Transactions on Mathematical Software (TOMS)
Pipelining for Locality Improvement in RK Methods
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
Blocking Techniques in Numerical Software
ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
Better tiling and array contraction for compiling scientific programs
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A comparison of empirical and model-driven optimization
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Estimating cache misses and locality using stack distances
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
The design and implementation of a new out-of-core sparse cholesky factorization method
ACM Transactions on Mathematical Software (TOMS)
A fast Fourier transform compiler
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Parallel and fully recursive multifrontal sparse Cholesky
Future Generation Computer Systems - Special issue: Selected numerical algorithms
Multilevel hierarchical matrix multiplication on clusters
Proceedings of the 18th annual international conference on Supercomputing
Rating Compiler Optimizations for Automatic Performance Tuning
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Performance and environment monitoring for continuous program optimization
IBM Journal of Research and Development
OpenMP issues arising in the development of parallel BLAS and LAPACK libraries
Scientific Programming - OpenMP
Improving locality for ODE solvers by program transformations
Scientific Programming
Optimizing code through iterative specialization
Proceedings of the 2008 ACM symposium on Applied computing
Combining building blocks for parallel multi-level matrix multiplication
Parallel Computing
Automatic analysis for managing and optimizing performance-code quality
Proceedings of the 2008 workshop on Static analysis
High-performance technical computing with erlang
Proceedings of the 7th ACM SIGPLAN workshop on ERLANG
Achieving accurate and context-sensitive timing for code optimization
Software—Practice & Experience
Optimization of a Computational Fluid Dynamics Code for the Memory Hierarchy: A Case Study
International Journal of High Performance Computing Applications
Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
autopin: automated optimization of thread-to-core pinning on multicore systems
Transactions on high-performance embedded architectures and compilers III
Smart data structures: an online machine learning approach to multicore data structures
Proceedings of the 8th ACM international conference on Autonomic computing
Journal of Computational and Applied Mathematics
Manipulating MAXLIVE for spill-free register allocation
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
A data locality methodology for matrix---matrix multiplication algorithm
The Journal of Supercomputing
Automatic tuning of PDGEMM towards optimal performance
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Implementing a GPU programming model on a Non-GPU accelerator architecture
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Profiling of task-based applications on shared memory machines: scalability and bottlenecks
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |