Three partition refinement algorithms
SIAM Journal on Computing
The input/output complexity of sorting and related problems
Communications of the ACM
On sorting strings in external memory (extended abstract)
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
External-memory graph algorithms
Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
Data on the Web: from relations to semistructured data and XML
Data on the Web: from relations to semistructured data and XML
Covering indexes for branching path queries
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Index Structures for Path Expressions
ICDT '99 Proceedings of the 7th International Conference on Database Theory
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
An efficient algorithm for computing bisimulation equivalence
Theoretical Computer Science
Efficient processing of XML path queries using the disk-based F&B Index
VLDB '05 Proceedings of the 31st international conference on Very large data bases
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Path queries on compressed XML
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
STXXL: standard template library for XXL data sets
Software—Practice & Experience
On the origins of bisimulation and coinduction
ACM Transactions on Programming Languages and Systems (TOPLAS)
Provenance as first class cloud data
ACM SIGOPS Operating Systems Review
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
An incremental bisimulation algorithm
FSTTCS'07 Proceedings of the 27th international conference on Foundations of software technology and theoretical computer science
XSym'10 Proceedings of the 7th international XML database conference on Database and XML technologies
The Foundations for Provenance on the Web
Foundations and Trends in Web Science
A quest for beauty and wealth (or, business processes for database researchers)
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Optimizing incremental maintenance of minimal bisimulation of cyclic graphs
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
What we talk about when we talk about graphs
Proceedings of the Joint EDBT/ICDT 2013 Workshops
I/O efficient: computing SCCs in massive graphs
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Regularities and dynamics in bisimulation reductions of big graphs
First International Workshop on Graph Data Management Experiences and Systems
Large-scale bisimulation of RDF graphs
Proceedings of the Fifth Workshop on Semantic Web Information Management
External memory K-bisimulation reduction of big graphs
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
In this paper we introduce the first efficient external-memory algorithm to compute the bisimilarity equivalence classes of a directed acyclic graph (DAG). DAGs are commonly used to model data in a wide variety of practical applications, ranging from XML documents and data provenance models, to web taxonomies and scientific workflows. In the study of efficient reasoning over massive graphs, the notion of node bisimilarity plays a central role. For example, grouping together bisimilar nodes in an XML data set is the first step in many sophisticated approaches to building indexing data structures for efficient XPath query evaluation. To date, however, only internal-memory bisimulation algorithms have been investigated. As the size of real-world DAG data sets often exceeds available main memory, storage in external memory becomes necessary. Hence, there is a practical need for an efficient approach to computing bisimulation in external memory. Our general algorithm has a worst-case IO-complexity of O(Sort(|N| + |E|)), where |N| and |E| are the numbers of nodes and edges, resp., in the data graph and Sort(n) is the number of accesses to external memory needed to sort an input of size n. We also study specializations of this algorithm to common variations of bisimulation for tree-structured XML data sets. We empirically verify efficient performance of the algorithms on graphs and XML documents having billions of nodes and edges, and find that the algorithms can process such graphs efficiently even when very limited internal memory is available. The proposed algorithms are simple enough for practical implementation and use, and open the door for further study of external-memory bisimulation algorithms. To this end, the full open-source C++ implementation has been made freely available.