The input/output complexity of sorting and related problems
Communications of the ACM
External-memory graph algorithms
Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
On external memory graph traversal
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
External memory algorithms and data structures: dealing with massive data
ACM Computing Surveys (CSUR)
Heuristics for semi-external depth first search on directed graphs
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Data Structures and Algorithms
Data Structures and Algorithms
Introduction to Algorithms
External-Memory Breadth-First Search with Sublinear I/O
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Improved Algorithms and Data Structures for Solving Graph Problems in External Memory
SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
A computational study of external-memory BFS algorithms
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
I/O-efficient undirected shortest paths with unbounded edge lengths
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
A Heuristic Strong Connectivity Algorithm for Large Graphs
SEA '09 Proceedings of the 8th International Symposium on Experimental Algorithms
Design and Engineering of External Memory Traversal Algorithms for General Graphs
Algorithmics of Large and Complex Networks
GRAIL: scalable reachability index for large graphs
Proceedings of the VLDB Endowment
Graph homomorphism revisited for graph matching
Proceedings of the VLDB Endowment
Efficient external-memory bisimulation on DAGs
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Hi-index | 0.00 |
A strongly connected component (SCC) is a maximal subgraph of a directed graph G in which every pair of nodes are reachable from each other in the SCC. With such a property, a general directed graph can be represented by a directed acyclic graph DAG by contracting an SCC of G to a node in DAG. In many real applications that need graph pattern matching, topological sorting, or reachability query processing, the best way to deal with a general directed graph is to deal with its DAG representation. Therefore, finding all SCCs in a directed graph G is a critical operation. The existing in-memory algorithms based on depth first search (DFS) can find all SCCs in linear time w.r.t. the size of a graph. However, when a graph cannot resident entirely in the main memory, the existing external or semi-external algorithms to find all SCCs have limitation to achieve high I/O efficiency. In this paper, we study new I/O efficient semi-external algorithms to find all SCCs for a massive directed graph G that cannot reside in main memory entirely. To overcome the deficiency of the existing DFS based semi-external algorithm that heavily relies on a total order, we explore a weak order based on which we investigate new algorithms. We propose a new two phase algorithm, namely, tree construction and tree search. In the tree construction phase, a spanning tree of G can be constructed in bounded sequential scans of G. In the tree search phase, it needs to sequentially scan the graph once to find all SCCs. In addition, we propose a new single phase algorithm, which combines the tree construction and tree search phases into a single phase, with three new optimization techniques. They are early acceptance, early rejection, and batch processing. By the single phase algorithm with the new optimization techniques, we can significantly reduce the number of I/Os and CPU cost. We conduct extensive experimental studies using 4 real datasets including a massive real dataset, and several synthetic datasets to confirm the I/O efficiency of our approaches.