Evaluating the potential of multithreaded platforms for irregular scientific computations
Proceedings of the 4th international conference on Computing frontiers
Locality behavior of parallel and sequential algorithms for irregular graph problems
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Accelerating large graph algorithms on the GPU using CUDA
HiPC'07 Proceedings of the 14th international conference on High performance computing
Petascale computing for large-scale graph problems
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
An effective GPU implementation of breadth-first search
Proceedings of the 47th Design Automation Conference
Ordered and unordered algorithms for parallel breadth first search
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Scalable Graph Exploration on Multicore Processors
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Indoor positioning using nonparametric belief propagation based on spanning trees
EURASIP Journal on Wireless Communications and Networking - Special issue on signal processing-assisted protocols and algorithms for cooperating objects and wireless sensor networks
Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Accelerating CUDA graph algorithms at maximum warp
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Analysis and performance results of computing betweenness centrality on IBM Cyclops64
The Journal of Supercomputing
Active pebbles: parallel programming for data-driven applications
Proceedings of the international conference on Supercomputing
Crunching large graphs with commodity processors
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Parallel breadth-first search on distributed memory systems
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
The Combinatorial BLAS: design, implementation, and applications
International Journal of High Performance Computing Applications
A GPU implementation of inclusion-based points-to analysis
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Green-Marl: a DSL for easy and efficient graph analysis
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Highly scalable graph search for the Graph500 benchmark
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
A Bandwidth-Optimized Multi-core Architecture for Irregular Applications
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Direction-optimizing breadth-first search
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Breaking the speed and scalability barriers for graph exploration on distributed-memory machines
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Approximate weighted matching on emerging manycore and multithreaded architectures
International Journal of High Performance Computing Applications
Prototyping hardware support for irregular applications
Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Atomic-free irregular computations on GPUs
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Understanding parallelism in graph traversal on multi-core clusters
Computer Science - Research and Development
Massive data analytics: the graph 500 on IBM Blue Gene/Q
IBM Journal of Research and Development
LVars: lattice-based data structures for deterministic parallelism
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Efficient breadth first search on multi-GPU systems
Journal of Parallel and Distributed Computing
Freeze after writing: quasi-deterministic parallel programming with LVars
Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
Direction-optimizing breadth-first search
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
Graph abstractions are extensively used to understand and solve challenging computational problems in various scientific and engineering domains. They have particularly gained prominence in recent years for applications involving large-scale networks. In this paper, we present fast parallel implementations of three fundamental graph theory problems, Breadth-First Search, st-connectivity and shortest paths for unweighted graphs, on multithreaded architectures such as the Cray MTA-2. The architectural features of the MTA-2 aid the design of simple, scalable and high-performance graph algorithms. We test our implementations on large scale-free and sparse random graph instances, and report impressive results, both for algorithm execution time and parallel performance. For instance, Breadth-First Search on a scale-free graph of 400 million vertices and 2 billion edges takes less than 5 seconds on a 40-processor MTA-2 system with an absolute speedup of close to 30. This is a significant result in parallel computing, as prior implementations of parallel graph algorithms report very limited or no speedup on irregular and sparse graphs, when compared to the best sequential implementation.