The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
The influence of caches on the performance of heaps
Journal of Experimental Algorithmics (JEA)
Internal loops in RNA secondary structure prediction
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Optimal prefetching and caching for parallel I/O sytems
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Cache-oblivious priority queue and graph algorithm applications
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Gilgamesh: a multithreaded processor-in-memory architecture for petaflops computing
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Tiling, Block Data Layout, and Memory Hierarchy Performance
IEEE Transactions on Parallel and Distributed Systems
A blocked all-pairs shortest-paths algorithm
Journal of Experimental Algorithmics (JEA)
Cache-oblivious shortest paths in graphs using buffer heap
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Optimizing Graph Algorithms for Improved Cache Performance
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
The nonserial polyadic dynamic programming algorithm is one of the most fundamental algorithms for solving discrete optimization problems. Although the loops in the nonserial polyadic dynamic programming algorithm are similar to those in matrix multiplication, the available automatic optimization techniques have little effect on this imperfect loop because of nonuniform data dependencies. In this paper, we develop algorithmic optimizations to improve the cache performance of the nonserial polyadic dynamic programming algorithm. Our algorithmic transformation takes advantage of the cache oblivious method by relaxing some dependencies in the standard iterative version. Based on the ideal cache model of the cache oblivious algorithm, the approximate bound of cache misses is given by $\Theta(\frac{n^{3}Z}{L\sqrt{Z}}+\frac{n^{2}}{L}+\frac{n}{L\sqrt{Z}})$. We also found that the optimized algorithm with the cache oblivious approach is more sensitive to conventional optimization techniques such as tiling. Experimental results on several platforms show that the optimized algorithms improve the cache performance and achieves speedups of 2---10 times.