Data prefetching and multilevel blocking for linear algebra operations

Authors:
Juan J. Navarro;Elena García-Diego;José R. Herrero
Affiliations:
Computer Architecture Department, Universitat Politècnica de Catalunya, Gran Capità s/n, Mòdul D6, E-08034 Barcelona, (Spain);Computer Architecture Department, Universitat Politècnica de Catalunya, Gran Capità s/n, Mòdul D6, E-08034 Barcelona, (Spain);Computer Architecture Department, Universitat Politècnica de Catalunya, Gran Capità s/n, Mòdul D6, E-08034 Barcelona, (Spain)
Venue:
ICS '96 Proceedings of the 10th international conference on Supercomputing
Year:
1996

Citing 11
Cited 5

Optimal loop parallelization

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Parallel algorithms for dense linear algebra computations

SIAM Review
Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
MOB forms: a class of multilevel block algorithms for dense linear algebra operations

ICS '94 Proceedings of the 8th international conference on Supercomputing
Improving performance of linear algebra algorithms for dense matrices, using algorithmic prefetch

IBM Journal of Research and Development
Complexity/performance tradeoffs with non-blocking loads

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tolerating latency through software-controlled data prefetching

Tolerating latency through software-controlled data prefetching

Matrix multiplication: a case study of enhanced data cache utilization

Journal of Experimental Algorithmics (JEA)
Reducing off-chip memory access via stream-conscious tiling on multimedia applications

International Journal of Parallel Programming
New data structures for matrices and specialized inner kernels: low overhead for high performance

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Exploring a Novel Gathering Method for Finite Element Codes on the Cell/B.E. Architecture

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Compiler-optimized kernels: an efficient alternative to hand-coded inner kernels

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V

Quantified Score

Hi-index	0.00

Data prefetching and multilevel blocking for linear algebra operations

Quantified Score

Visualization

Abstract