PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
MOB forms: a class of multilevel block algorithms for dense linear algebra operations
ICS '94 Proceedings of the 8th international conference on Supercomputing
Improving performance of linear algebra algorithms for dense matrices, using algorithmic prefetch
IBM Journal of Research and Development
Complexity/performance tradeoffs with non-blocking loads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tolerating latency through software-controlled data prefetching
Tolerating latency through software-controlled data prefetching
Matrix multiplication: a case study of enhanced data cache utilization
Journal of Experimental Algorithmics (JEA)
Reducing off-chip memory access via stream-conscious tiling on multimedia applications
International Journal of Parallel Programming
New data structures for matrices and specialized inner kernels: low overhead for high performance
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Exploring a Novel Gathering Method for Finite Element Codes on the Cell/B.E. Architecture
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Compiler-optimized kernels: an efficient alternative to hand-coded inner kernels
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
Hi-index | 0.00 |