Three partition refinement algorithms
SIAM Journal on Computing
The input/output complexity of sorting and related problems
Communications of the ACM
On sorting strings in external memory (extended abstract)
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Parallel Algorithms for Relational Coarsest Partition Problems
IEEE Transactions on Parallel and Distributed Systems
The string B-tree: a new data structure for string search in external memory and its applications
Journal of the ACM (JACM)
Index Structures for Path Expressions
ICDT '99 Proceedings of the 7th International Conference on Database Theory
D(k)-index: an adaptive structural summary for graph-structured data
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient processing of joins on set-valued attributes
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
From Bisimulation to Simulation: Coarsest Partition Problems
Journal of Automated Reasoning
An efficient algorithm for computing bisimulation equivalence
Theoretical Computer Science
Incremental maintenance of XML structural indexes
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Distributed state space minimization
International Journal on Software Tools for Technology Transfer (STTT) - Special section on formal methods for industrial critical systems
Path queries on compressed XML
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
STXXL: standard template library for XXL data sets
Software—Practice & Experience
Efficient aggregation for graph summarization
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SP^2Bench: A SPARQL Performance Benchmark
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
What is Twitter, a social network or a news media?
Proceedings of the 19th international conference on World wide web
Linked Data
Advanced Topics in Bisimulation and Coinduction
Advanced Topics in Bisimulation and Coinduction
Query preserving graph compression
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Efficient external-memory bisimulation on DAGs
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Graph pattern matching revised for social network analysis
Proceedings of the 15th International Conference on Database Theory
A structural approach to indexing triples
ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
GraphChi: large-scale graph computation on just a PC
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Regularities and dynamics in bisimulation reductions of big graphs
First International Workshop on Graph Data Management Experiences and Systems
Bisimulation reduction of big graphs on mapreduce
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Hi-index | 0.00 |
In this paper, we present, to our knowledge, the first known I/O efficient solutions for computing the k-bisimulation partition of a massive directed graph, and performing maintenance of such a partition upon updates to the underlying graph. Ubiquitous in the theory and application of graph data, bisimulation is a robust notion of node equivalence which intuitively groups together nodes in a graph which share fundamental structural features. k-bisimulation is the standard variant of bisimulation where the topological features of nodes are only considered within a local neighborhood of radius k 0. The I/O cost of our partition construction algorithm is bounded by O(k · sort}(|Et|) + k · scan(|Nt|) + sort(|Nt|)), while our maintenance algorithms are bounded by O(k · sort}(|Et|) + k · scan(|Nt|). The space complexity bounds are O(|Nt|+|Et|)$ and O(k · |Nt|+k ·|Et|), resp. Here, |Et| and |Nt| are the number of disk pages occupied by the input graph's edge set and node set, resp., and sort(n) and scan(n) are the cost of sorting and scanning, resp., a file occupying n pages in external memory. Empirical analysis on a variety of massive real-world and synthetic graph datasets shows that our algorithms perform efficiently in practice, scaling gracefully as graphs grow in size.