Optimizing Large-Scale Graph Analysis on a Multi-threaded, Multi-core Platform

Authors:
Guojing Cong;Konstantin Makarychev
Affiliations:
-;-
Venue:
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Year:
2011

Citing 0
Cited 2

Managing large graphs on multi-cores with graph awareness

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Betweenness centrality: algorithms and implementations

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

The erratic memory access pattern makes it hard to implement fast large-scale graph analysis. Although algorithms of fine-grain parallelism seem to benefit from multithreading, it is unclear whether the long memory latency of such workload is fully masked on current systems, and if not, whether improving locality brings any performance benefit, especially when the cache is simple. We optimize several fundamental graph algorithms on a multi-threaded, multi-core platform, with simple caches. Although the naive implementation scales, we show nonetheless the number of hardware threads is insufficient to fully mask the memory latency for typical graph analysis workload and the processor is unlikely to be fully utilized. In optimizing for cache performance, we show that known cache-friendly designs that prove effective on traditional architectures do not perform well on this platform. We explore low-cost measures such as software prefetching and manipulating the storage of the input to improve performance. Our results show that compared with the original implementation speedups between 10% and 200% are achieved at different number of threads with our optimization.