Exploring the limits of GPGPU scheduling in control flow bound applications
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
A GPU implementation of inclusion-based points-to analysis
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Green-Marl: a DSL for easy and efficient graph analysis
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Managing large graphs on multi-cores with graph awareness
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
A yoke of oxen and a thousand chickens for heavy lifting graph processing
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Direction-optimizing breadth-first search
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Breaking the speed and scalability barriers for graph exploration on distributed-memory machines
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Ligra: a lightweight graph processing framework for shared memory
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Massive data analytics: the graph 500 on IBM Blue Gene/Q
IBM Journal of Research and Development
Glinda: a framework for accelerating imbalanced applications on heterogeneous platforms
Proceedings of the ACM International Conference on Computing Frontiers
Scale-up graph processing: a storage-centric view
First International Workshop on Graph Data Management Experiences and Systems
On fast parallel detection of strongly connected components (SCC) in small-world graphs
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Efficient breadth first search on multi-GPU systems
Journal of Parallel and Distributed Computing
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
X-Stream: edge-centric graph processing using streaming partitions
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
The energy case for graph processing on hybrid CPU and GPU systems
IA^3 '13 Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
Journal of Parallel and Distributed Computing
Efficient aerial image simulation on multi-core SIMD CPU
Proceedings of the International Conference on Computer-Aided Design
Direction-optimizing breadth-first search
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
Graphs are a fundamental data representation that has been used extensively in various domains. In graph-based applications, a systematic exploration of the graph such as a breadth-first search (BFS) often serves as a key component in the processing of their massive data sets. In this paper, we present a new method for implementing the parallel BFS algorithm on multi-core CPUs which exploits a fundamental property of randomly shaped real-world graph instances. By utilizing memory bandwidth more efficiently, our method shows improved performance over the current state-of-the-art implementation and increases its advantage as the size of the graph increases. We then propose a hybrid method which, for each level of the BFS algorithm, dynamically chooses the best implementation from: a sequential execution, two different methods of multicore execution, and a GPU execution. Such a hybrid approach provides the best performance for each graph size while avoiding poor worst-case performance on high-diameter graphs. Finally, we study the effects of the underlying architecture on BFS performance by comparing multiple CPU and GPU systems, a high-end GPU system performed as well as a quad-socket high-end CPU system.