Parallel graph partitioning on multicore architectures

Authors:
Xin Sui;Donald Nguyen;Martin Burtscher;Keshav Pingali
Affiliations:
Department of Computer Science, University of Texas, Austin;Department of Computer Science, University of Texas, Austin;Department of Computer Science, Texas State University, San Marcos;Department of Computer Science, University of Texas, Austin and Institute for Computational Engineering and Sciences, University of Texas, Austin
Venue:
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Year:
2010

Citing 6
Cited 1

Multilevel k-way partitioning scheme for irregular graphs

Journal of Parallel and Distributed Computing
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Optimistic parallelism requires abstractions

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
PT-Scotch: A tool for efficient parallel graph ordering

Parallel Computing
Structure-driven optimizations for amorphous data-parallel programs

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
The university of Florida sparse matrix collection

ACM Transactions on Mathematical Software (TOMS)

A parallel graph partitioning algorithm to speed up the large-scale distributed graph mining

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graph partitioning is a common and frequent preprocessing step in many high-performance parallel applications on distributed-and shared-memory architectures. It is used to distribute graphs across memory and to improve spatial locality. There are several parallel implementations of graph partitioning for distributed-memory architectures. In this paper, we present a parallel graph partitioner that implements a variation of the Metis partitioner for shared-memory, multicore architectures. We show that (1) the parallelism in this algorithm is an instance of the general amorphous data-parallelism pattern, and (2) a parallel implementation can be derived systematically from a sequential specification of the algorithm. The resulting program can be executed in parallel using the Galois system for optimistic parallelization. The scalability of this parallel implementation compares favorably with that of a publicly available, hand-parallelized C implementation of the algorithm, ParMetis, but absolute performance is lower because of missing sequential optimizations in our system. On a set of 15 large, publicly available graphs, we achieve an average scalability of 2.98X on 8 cores with our implementation, compared with 1.77X for ParMetis, and we achieve an average speedup of 2.80X over Metis, compared with 3.60X for ParMetis. These results show that our systematic approach for parallelizing irregular algorithms on multicore architectures is promising.