The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Combining loop transformations considering caches and scheduling
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Locality optimizations for multi-level caches
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Tiling optimizations for 3D scientific computations
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Variable-sized object packing and its applications to instruction cache design
Computers and Electrical Engineering
Hi-index | 0.00 |
Because performance disparity between processor and main memory is serious, it is necessary to reduce off-chip memory accesses by exploiting temporal locality. Loop tiling is a well-known optimization which enhances data locality. In this paper, we show a new cost model to select the best tile size in 3D partial differential equations. Our cost model carefully takes account of the effect of cache line. We present performance evaluation of our cost models. The evaluation results reveal the superiority of our cost model to other cost models proposed so far.