An extended set of FORTRAN basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Using Strassen's algorithm to accelerate the solution of linear systems
The Journal of Supercomputing
LAPACK's user's guide
DXML: a high-performance scientific subroutine library
Digital Technical Journal
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
The Combined Effectiveness of Unimodular Transformations, Tiling, and Software Prefetching
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Hierarchical tiling for improved superscalar performance
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Space-limited procedures: a methodology for portable high-performance
PMMP '95 Proceedings of the conference on Programming Models for Massively Parallel Computers
LAPACK Working Note 95: ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers -- Design Issues and Performance
Optimizing Matrix Multiply using PHiPAC: a Portable,High-Performance, ANSI C Coding Methodology
Optimizing Matrix Multiply using PHiPAC: a Portable,High-Performance, ANSI C Coding Methodology
Automatic benchmark generation for cache optimization of matrix operations
ACM-SE 33 Proceedings of the 33rd annual on Southeast regional conference
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Recursive array layouts and fast parallel matrix multiplication
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Memory characteristics of iterative methods
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
AJaPACK: experiments in performance portable parallel Java numerical libraries
Proceedings of the ACM 2000 conference on Java Grande
Finding least common ancestors in directed acyclic graphs
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Optimizing locality for ODE solvers
ICS '01 Proceedings of the 15th international conference on Supercomputing
SPL: a language and compiler for DSP algorithms
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Language support for Morton-order matrices
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
A recursive formulation of Cholesky factorization of a matrix in packed storage
ACM Transactions on Mathematical Software (TOMS)
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Array form representation of idiom recognition system for numerical programs
Proceedings of the 2001 conference on APL: an arrays odyssey
Stochastic search for signal processing algorithm optimization
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Quantifying the Multi-Level Nature of Tiling Interactions
International Journal of Parallel Programming
Recursive Array Layouts and Fast Matrix Multiplication
IEEE Transactions on Parallel and Distributed Systems
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
A Family of High-Performance Matrix Multiplication Algorithms
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Statistical Models for Automatic Performance Tuning
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Rescheduling for Locality in Sparse Matrix Computations
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Parallel and Fully Recursive Multifrontal Supernodal Sparse Cholesky
ICCS '02 Proceedings of the International Conference on Computational Science-Part II
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Code Generators for Automatic Tuning of Numerical Kernels: Experiences with FFTW
SAIG '00 Proceedings of the International Workshop on Semantics, Applications, and Implementation of Program Generation
A Characterization of Temporal Locality and Its Portability across Memory Hierarchies
ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Cache Models for Iterative Compilation
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Delayed Evaluation, Self-optimising Software Components as a Programming Model
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Pipelining for Locality Improvement in RK Methods
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
OCEANS - Optimising Compilers for Embedded Applications
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Fractal Matrix Multiplication: A Case Study on Portability of Cache Performance
WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
Blocking Techniques in Numerical Software
ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
Knowledge Discovery in Auto-tuning Parallel Numerical Library
Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
Heterogeneous Networks of Workstations and the Parallel Matrix Multiplication
Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Better tiling and array contraction for compiling scientific programs
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A high-level approach to synthesis of high-performance codes for quantum chemistry
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Performance optimizations and bounds for sparse matrix-vector multiply
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Embedded processor design challenges
On the Parallel Execution Time of Tiled Loops
IEEE Transactions on Parallel and Distributed Systems
Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework
IEEE Transactions on Parallel and Distributed Systems
Formal derivation of algorithms: The triangular sylvester equation
ACM Transactions on Mathematical Software (TOMS)
QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Self-adapting software for numerical linear algebra and LAPACK for clusters
Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
Performance optimization of RK methods using block-based pipelining
Performance analysis and grid computing
Effect of auto-tuning with user's knowledge for numerical software
Proceedings of the 1st conference on Computing frontiers
A fast Fourier transform compiler
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Architecture of an automatically tuned linear algebra library
Parallel Computing
Parallel and fully recursive multifrontal sparse Cholesky
Future Generation Computer Systems - Special issue: Selected numerical algorithms
Multilevel hierarchical matrix multiplication on clusters
Proceedings of the 18th annual international conference on Supercomputing
High-performance linear algebra algorithms using new generalized data structures for matrices
IBM Journal of Research and Development
Communication lower bounds for distributed-memory matrix multiplication
Journal of Parallel and Distributed Computing
Optimizing Sorting with Genetic Algorithms
Proceedings of the international symposium on Code generation and optimization
Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy
Proceedings of the international symposium on Code generation and optimization
A Geometric Programming Framework for Optimal Multi-Level Tiling
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
The Opie compiler from row-major source to Morton-ordered matrices
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Energy aware lossless data compression
Proceedings of the 1st international conference on Mobile systems, applications and services
Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms
International Journal of High Performance Computing Applications
Statistical Models for Empirical Search-Based Performance Tuning
International Journal of High Performance Computing Applications
Sparsity: Optimization Framework for Sparse Matrix Kernels
International Journal of High Performance Computing Applications
Automatic generation and tuning of MPI collective communication routines
Proceedings of the 19th annual international conference on Supercomputing
Automatic Tuning Matrix Multiplication Performance on Graphics Hardware
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Reduction Transformations for Optimization Parameter Selection
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Lowest common ancestors in trees and directed acyclic graphs
Journal of Algorithms
The Journal of Supercomputing
Online performance auditing: using hot optimizations without getting burned
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Optimizing locality and scalability of embedded Runge--Kutta solvers using block-based pipelining
Journal of Parallel and Distributed Computing
ABCLib_DRSSED: A parallel eigensolver with an auto-tuning facility
Parallel Computing
Self-adapting numerical software (SANS) effort
IBM Journal of Research and Development
Energy-aware lossless data compression
ACM Transactions on Computer Systems (TOCS)
Empirical optimization for a sparse linear solver: a case study
International Journal of Parallel Programming - Special issue: The next generation software program
STAR-MPI: self tuned adaptive routines for MPI collective operations
Proceedings of the 20th annual international conference on Supercomputing
Profitable loop fusion and tiling using model-driven empirical search
Proceedings of the 20th annual international conference on Supercomputing
A comparison of online and offline strategies for program adaptation
ACM-SE 45 Proceedings of the 45th annual southeast regional conference
Improving locality for ODE solvers by program transformations
Scientific Programming
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
International Journal of Computational Science and Engineering
Multi-level tiling: M for the price of one
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Sketching concurrent data structures
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Combining building blocks for parallel multi-level matrix multiplication
Parallel Computing
Families of algorithms related to the inversion of a Symmetric Positive Definite matrix
ACM Transactions on Mathematical Software (TOMS)
The impact of paravirtualized memory hierarchy on linear algebra computational kernels and software
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Positivity, posynomials and tile size selection
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Performance Model for Parallel Mathematical Libraries Based on Historical Knowledgebase
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
A tuning framework for software-managed memory hierarchies
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Achieving accurate and context-sensitive timing for code optimization
Software—Practice & Experience
How to Write Fast Numerical Code: A Small Introduction
Generative and Transformational Techniques in Software Engineering II
Adaptive Winograd's matrix multiplications
ACM Transactions on Mathematical Software (TOMS)
Quick and Practical Run-Time Evaluation of Multiple Program Optimizations
Transactions on High-Performance Embedded Architectures and Compilers I
PetaBricks: a language and compiler for algorithmic choice
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Model-guided autotuning of high-productivity languages for petascale computing
Proceedings of the 18th ACM international symposium on High performance distributed computing
Generating Empirically Optimized Composed Matrix Kernels from MATLAB Prototypes
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
A Note on Auto-tuning GEMM for GPUs
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Autotuning multigrid with PetaBricks
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Automating the generation of composed linear algebra kernels
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Lowest common ancestors in trees and directed acyclic graphs
Journal of Algorithms
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Self-adapting numerical software and automatic tuning of heuristics
ICCS'03 Proceedings of the 2003 international conference on Computational science
Self-adapting numerical software and automatic tuning of heuristics
ICCS'03 Proceedings of the 2003 international conference on Computational science
Self-adapting software for numerical linear algebra library routines on clusters
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Memory hierarchy optimizations and performance bounds for sparse ATAx
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
A compiler approach to performance prediction using empirical-based modeling
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Minimal data copy for dense linear algebra factorization
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
d-spline based incremental parameter estimation in automatic performance tuning
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Using recursion to boost ATLAS's performance
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Speeding up Nek5000 with autotuning and specialization
Proceedings of the 24th ACM International Conference on Supercomputing
SLAMM - Automating Memory Analysis for Numerical Algorithms
Electronic Notes in Theoretical Computer Science (ENTCS)
An input-centric paradigm for program dynamic optimizations
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Measuring execution times of collective communications in an empirical optimization framework
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Towards the design of an automatically tuned linear algebra library
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Automated empirical tuning of scientific codes for performance and power consumption
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
The Vocal Joystick Engine v1.0
Computer Speech and Language
Parallel memory prediction for fused linear algebra kernels
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Parallel Low-Storage Runge-Kutta Solvers for ODE Systems with Limited Access Distance
International Journal of High Performance Computing Applications
Smart data structures: an online machine learning approach to multicore data structures
Proceedings of the 8th ACM international conference on Autonomic computing
AARTS: low overhead online adaptive auto-tuning
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Probabilistic auto-tuning for architectures with complex constraints
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
An efficient evolutionary algorithm for solving incrementally structured problems
Proceedings of the 13th annual conference on Genetic and evolutionary computation
Autotuned parallel I/O for highly scalable biosequence analysis
Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery
Journal of Computational and Applied Mathematics
A step towards transparent integration of input-consciousness into dynamic program optimizations
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Automatic performance programming
Proceedings of the 10th SIGPLAN symposium on New ideas, new paradigms, and reflections on programming and software
Optimizing symmetric dense matrix-vector multiplication on GPUs
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Optimizing matrix multiplication with a classifier learning system
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Analytic models and empirical search: a hybrid approach to code optimization
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
A data locality methodology for matrix---matrix multiplication algorithm
The Journal of Supercomputing
Journal of Parallel and Distributed Computing
A practical method for quickly evaluating program optimizations
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Compiler-optimized kernels: an efficient alternative to hand-coded inner kernels
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
JuliusC: a practical approach for the analysis of divide-and-conquer algorithms
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
A code isolator: isolating code fragments from large programs
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
A family of high-performance matrix multiplication algorithms
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Automatic tuning technique exploring within the hardware-specific constrained parameters
LSSC'05 Proceedings of the 5th international conference on Large-Scale Scientific Computing
An evaluation towards automatically tuned eigensolvers
LSSC'05 Proceedings of the 5th international conference on Large-Scale Scientific Computing
Evaluating iterative compilation
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Performance modeling and optimal block size selection for the small-bulge multishift QR algorithm
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Language and compiler support for auto-tuning variable-accuracy algorithms
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Automated programmable control and parameterization of compiler optimizations
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
POET: a scripting language for applying parameterized source-to-source program transformations
Software—Practice & Experience
Analytical bounds for optimal tile size selection
CC'12 Proceedings of the 21st international conference on Compiler Construction
Cache blocking for linear algebra algorithms
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Autotuning of adaptive mesh refinement PDE solvers on shared memory architectures
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Siblingrivalry: online autotuning through local competitions
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Locality optimized shared-memory implementations of iterated runge-kutta methods
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
A script-based autotuning compiler system to generate high-performance CUDA code
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Layout-oblivious compiler optimization for matrix computations
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Towards a functional run-time for dense NLA domain
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
AUGEM: automatically generate high performance dense linear algebra kernels on x86 CPUs
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Taming parallel I/O complexity with auto-tuning
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Precimonious: tuning assistant for floating-point precision
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Spiral in scala: towards the systematic construction of generators for performance libraries
Proceedings of the 12th international conference on Generative programming: concepts & experiences
Adaptive Mapping and Parameter Selection Scheme to Improve Automatic Code Generation for GPUs
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
A Basic Linear Algebra Compiler
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
ACM Transactions on Architecture and Code Optimization (TACO)
An Infrastructure for Tackling Input-Sensitivity of GPU Program Optimizations
International Journal of Parallel Programming
Hi-index | 0.00 |