Analysis of Memory Hierarchy Performance of Block Data Layout

Authors:
Neungsoo Park;Bo Hong;Viktor K. Prasanna
Affiliations:
-;-;-
Venue:
ICPP '02 Proceedings of the 2002 International Conference on Parallel Processing
Year:
2002

Citing 0
Cited 10

Optimizing Graph Algorithms for Improved Cache Performance

IEEE Transactions on Parallel and Distributed Systems
Automatic tiling of iterative stencil loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
The potential of the cell processor for scientific computing

Proceedings of the 3rd conference on Computing frontiers
Scientific computing Kernels on the cell processor

International Journal of Parallel Programming
Fast indexing for blocked array layouts to reduce cache misses

International Journal of High Performance Computing and Networking
Dynamic tiling for effective use of shared caches on multithreaded processors

International Journal of High Performance Computing and Networking
QR factorization for the Cell Broadband Engine

Scientific Programming - High Performance Computing with the Cell Broadband Engine
Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor

Parallel Computing
Scheduling two-sided transformations using tile algorithms on multicore architectures

Scientific Programming
Tuning blocked array layouts to exploit memory hierarchy in SMT architectures

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, several experimental studies have been conducted on block data layout as a data transformation technique used in conjunction with tiling to improve cache performance. In this paper, we provide a theoretical analysis for the TLB and cache performance of block data layout. For standard matrix access patterns, we derive an asymptotic lower bound on the number of TLB misses for any data layout and show that block data layout achieves this bound. We show that block data layout improves TLB misses by a factor of O(B) compared with conventional data layouts, where B is the block size of block data layout. This reductioncontributes to the improvement in memory hierarchy performance. Using our TLB and cache analysis, we also discuss the impact of block size on the overall memory hierarchy performance. These results are validated through simulations and experiments on state-of-the-art platforms.