Analysis and performance results of computing betweenness centrality on IBM Cyclops64

  • Authors:
  • Guangming Tan;Vugranam C. Sreedhar;Guang R. Gao

  • Affiliations:
  • Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China and Computer Architecture and Parallel Systems Laboratory, University of Delaware, Newark, USA;IBM T. J. Watson Research Center, Cambridge, USA;Computer Architecture and Parallel Systems Laboratory, University of Delaware, Newark, USA

  • Venue:
  • The Journal of Supercomputing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a joint study of application and architecture to improve the performance and scalability of an irregular application--computing betweenness centrality--on a many-core architecture IBM Cyclops64. The characteristics of unstructured parallelism, dynamically non-contiguous memory access, and low arithmetic intensity in betweenness centrality pose an obstacle to an efficient mapping of parallel algorithms on such many-core architectures. By identifying several key architectural features, we propose and evaluate efficient strategies for achieving scalability on a massive multi-threading many-core architecture. We demonstrate several optimization strategies including multi-grain parallelism, just-in-time locality with explicit memory hierarchy and non-preemptive thread execution, and fine-grain data synchronization. Comparing with a conventional parallel algorithm, we get 4X-50X improvement in performance and 16X improvement in scalability on a 128-cores IBM Cyclops64 simulator.