Data networks
A fast Fourier transform compiler
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
FLAME: Formal Linear Algebra Methods Environment
ACM Transactions on Mathematical Software (TOMS)
Art and Theory of Dynamic Programming
Art and Theory of Dynamic Programming
A new approach to dynamic all pairs shortest paths
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
A blocked all-pairs shortest-paths algorithm
Journal of Experimental Algorithmics (JEA)
Optimizing Graph Algorithms for Improved Cache Performance
IEEE Transactions on Parallel and Distributed Systems
The science of deriving dense linear algebra algorithms
ACM Transactions on Mathematical Software (TOMS)
Computing the shortest path: A search meets graph theory
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Sparsity: Optimization Framework for Sparse Matrix Kernels
International Journal of High Performance Computing Applications
Computing almost shortest paths
ACM Transactions on Algorithms (TALG)
Generation and optimisation of code using Coxeter lattice paths
Proceedings of the 2007 international workshop on Parallel symbolic computation
All-pairs shortest-paths for large graphs on the GPU
Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Solving path problems on the GPU
Parallel Computing
A task parallel algorithm for finding all-pairs shortest paths using the GPU
International Journal of High Performance Computing and Networking
POET: a scripting language for applying parameterized source-to-source program transformations
Software—Practice & Experience
Mitigating the compiler optimization phase-ordering problem using machine learning
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Hi-index | 0.00 |
A recent trend in computing are domain-specific program generators, designed to alleviate the effort of porting and reoptimizing libraries for fast-changing and increasingly complex computing platforms. Examples include ATLAS, SPIRAL, and the codelet generator in FFTW. Each of these generators produces highly optimized source code directly from a problem specification. In this paper, we extend this list by a program generator for the well-known Floyd-Warshall (FW) algorithm that solves the all-pairs shortest path problem, which is important in a wide range of engineering applications.As the first contribution, we derive variants of the FW algorithm that make it possible to apply many of the optimization techniques developed for matrix-matrix multiplication. The second contribution is the actual program generator, which uses tiling, loop unrolling, and SIMD vectorization combined with a hill climbing search to produce the best code (float or integer) for a given platform.Using the program generator, we demonstrate a speedup over a straightforward single-precision implementation of up to a factor of 1.3 on Pentium 4 and 1.8 on Athlon 64. Use of 4-way vectorization further improves the performance by another factor of up to 5.7 on Pentium 4 and 3.0 on Athlon 64. For data type short integers, 8-way vectorization provides a speed-up of up to 4.6 on Pentium 4 and 5.0 on Athlon 64 over the best scalar code.