A blocked all-pairs shortest-paths algorithm

Authors:
Gayathri Venkataraman;Sartaj Sahni;Srabani Mukhopadhyaya
Affiliations:
Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL;Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL;Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL
Venue:
Journal of Experimental Algorithmics (JEA)
Year:
2003

Citing 18
Cited 7

A model for hierarchical memory

STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
An analytical cache model

ACM Transactions on Computer Systems (TOCS)
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Efficient trace-driven simulation methods for cache performance analysis

ACM Transactions on Computer Systems (TOCS)
MemSpy: analyzing memory system bottlenecks in programs

SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A Model of Workloads and its Use in Miss-Rate Prediction for Fully Associative Caches

IEEE Transactions on Computers
Cache interference phenomena

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Influence of cross-interferences on blocked loops: a case study with matrix-vector multiply

ACM Transactions on Programming Languages and Systems (TOPLAS)
The influence of caches on the performance of heaps

Journal of Experimental Algorithmics (JEA)
The influence of caches on the performance of sorting

Journal of Algorithms
Cache performance analysis of traversals and random accesses

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Matrix multiplication: a case study of enhanced data cache utilization

Journal of Experimental Algorithmics (JEA)
Organizing matrices and matrix operations for paged memory systems

Communications of the ACM
Improving memory performance of sorting algorithms

Journal of Experimental Algorithmics (JEA)
Analysing cache effects in distribution sorting

Journal of Experimental Algorithmics (JEA)
Computer Algorithms: C++

Computer Algorithms: C++

Program generation for the all-pairs shortest path problem

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Locality and parallelism optimization for dynamic programming algorithm in bioinformatics

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Cache oblivious algorithms for nonserial polyadic programming

The Journal of Supercomputing
All-pairs shortest-paths for large graphs on the GPU

Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Parallel FPGA-based all-pairs shortest-paths in a directed graph

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Simulating large topologies in ns-3 using BRITE and CUDA driven global routing

Proceedings of the 6th International ICST Conference on Simulation Tools and Techniques
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a blocked version of Floyd's all-pairs shortest-paths algorithm. The blocked algorithm makes better utilization of cache than does Floyd's original algorithm. Experiments indicate that the blocked algorithm delivers a speedup (relative to the unblocked Floyd's algorithm) between 1.6 and 1.9 on a Sun Ultra Enterprise 4000/5000 for graphs that have between 480 and 3200 vertices. The measured speedup on an SGI O2 for graphs with between 240 and 1200 vertices is between 1.6 and 2.