Cache oblivious algorithms for nonserial polyadic programming

Authors:
Guangming Tan;Shengzhong Feng;Ninghui Sun
Affiliations:
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 100080;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 100080;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 100080
Venue:
The Journal of Supercomputing
Year:
2007

Citing 14
Cited 0

The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
The influence of caches on the performance of heaps

Journal of Experimental Algorithmics (JEA)
Internal loops in RNA secondary structure prediction

RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Cache-conscious structure layout

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Optimal prefetching and caching for parallel I/O sytems

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Cache-oblivious priority queue and graph algorithm applications

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Gilgamesh: a multithreaded processor-in-memory architecture for petaflops computing

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Tiling, Block Data Layout, and Memory Hierarchy Performance

IEEE Transactions on Parallel and Distributed Systems
A blocked all-pairs shortest-paths algorithm

Journal of Experimental Algorithmics (JEA)
Cache-oblivious shortest paths in graphs using buffer heap

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Optimizing Graph Algorithms for Improved Cache Performance

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The nonserial polyadic dynamic programming algorithm is one of the most fundamental algorithms for solving discrete optimization problems. Although the loops in the nonserial polyadic dynamic programming algorithm are similar to those in matrix multiplication, the available automatic optimization techniques have little effect on this imperfect loop because of nonuniform data dependencies. In this paper, we develop algorithmic optimizations to improve the cache performance of the nonserial polyadic dynamic programming algorithm. Our algorithmic transformation takes advantage of the cache oblivious method by relaxing some dependencies in the standard iterative version. Based on the ideal cache model of the cache oblivious algorithm, the approximate bound of cache misses is given by $\Theta(\frac{n^{3}Z}{L\sqrt{Z}}+\frac{n^{2}}{L}+\frac{n}{L\sqrt{Z}})$. We also found that the optimized algorithm with the cache oblivious approach is more sensitive to conventional optimization techniques such as tiling. Experimental results on several platforms show that the optimized algorithms improve the cache performance and achieves speedups of 2---10 times.