Selecting tile shape for minimal execution time
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Locality optimizations for multi-level caches
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Improving Memory Traffic by Assembly-Level Exploitation of Reuses for Vector Registers
The Journal of Supercomputing
Tiling optimizations for 3D scientific computations
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Exact analysis of the cache behavior of nested loops
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Loop Transformations for Hierarchical Parallelism and Locality
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
On the Parallel Execution Time of Tiled Loops
IEEE Transactions on Parallel and Distributed Systems
Transforming Complex Loop Nests for Locality
The Journal of Supercomputing
Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy
Proceedings of the international symposium on Code generation and optimization
Automatic blocking of QR and LU factorizations for locality
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Cache-Efficient Multigrid Algorithms
International Journal of High Performance Computing Applications
Hi-index | 0.00 |