Efficient Breadth-First Search on the Cell/BE Processor

Authors:
Daniele Paolo Scarpazza;Oreste Villa;Fabrizio Petrini
Affiliations:
-;-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2008

Citing 0
Cited 13

3D seismic imaging through reverse-time migration on homogeneous and heterogeneous multi-core processors

Scientific Programming - High Performance Computing with the Cell Broadband Engine
Parallel exact inference on the Cell Broadband Engine processor

Journal of Parallel and Distributed Computing
Fast PGAS connected components algorithms

Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
An effective GPU implementation of breadth-first search

Proceedings of the 47th Design Automation Conference
Scalable Graph Exploration on Multicore Processors

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Parallel breadth-first search on distributed memory systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
I/O-efficient data structures for colored range and prefix reporting

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Scalable GPU graph traversal

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Highly scalable graph search for the Graph500 benchmark

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
A yoke of oxen and a thousand chickens for heavy lifting graph processing

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Breaking the speed and scalability barriers for graph exploration on distributed-memory machines

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Large-scale energy-efficient graph traversal: a path to efficient data-intensive supercomputing

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Understanding parallelism in graph traversal on multi-core clusters

Computer Science - Research and Development

Quantified Score

Hi-index	0.01

Visualization

Abstract

Multi-core processors are a shift of paradigm in computer architecture that promises a dramatic increase in performance. But they also bring an unprecedented level of complexity in algorithmic design and software development. In this paper we describe the challenges involved in designing a Breadth-First Search (BFS) algorithm for the Cell/B.E. processor. The proposed methodology combines a high-level algorithmic design that captures the machine-independent aspects, to guarantee portability with performance to future processors, with an implementation that embeds processor-specific optimizations. Using a fine-grained global coordination strategy derived by the Bulk-Synchronous Parallel (BSP) model, we have determined an accurate performance model that has guided the implementation and the optimization of our algorithm. Our experiments on a pre-production Cell/B.E. board running at 3.2 GHz, show almost linear speedups when using multiple synergistic processing elements, and an impressive level of performance when compared to other processors. On graphs which offer sufficient parallelism, the Cell/B.E. is typically an order of magnitude faster than conventional processors, such as the AMD Opteron and the Intel Pentium 4 and Woodcrest, and custom-designed architectures, such as the MTA-2 and BlueGene/L.