Large-scale bisimulation of RDF graphs

Authors:
Alexander Schätzle;Antony Neu;Georg Lausen;Martin Przyjaciel-Zablocki
Affiliations:
University of Freiburg, Freiburg, Germany;University of Freiburg, Freiburg, Germany;University of Freiburg, Freiburg, Germany;University of Freiburg, Freiburg, Germany
Venue:
Proceedings of the Fifth Workshop on Semantic Web Information Management
Year:
2013

Citing 18
Cited 0

Extracting schema from semistructured data

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Communication and Concurrency

Communication and Concurrency
Covering indexes for branching path queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Index Structures for Path Expressions

ICDT '99 Proceedings of the 7th International Conference on Database Theory
CCS expressions, finite state processes, and three problems of equivalence

PODC '83 Proceedings of the second annual ACM symposium on Principles of distributed computing
D(k)-index: an adaptive structural summary for graph-structured data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
An efficient algorithm for computing bisimulation equivalence

Theoretical Computer Science
A distributed algorithm for strong bisimulation reduction of state spaces

International Journal on Software Tools for Technology Transfer (STTT) - Special section on parallel and distributed model checking
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
On the origins of bisimulation and coinduction

ACM Transactions on Programming Languages and Systems (TOPLAS)
SP^2Bench: A SPARQL Performance Benchmark

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
DBpedia - A crystallization point for the Web of Data

Web Semantics: Science, Services and Agents on the World Wide Web
LUBM: A benchmark for OWL knowledge base systems

Web Semantics: Science, Services and Agents on the World Wide Web
Apples and oranges: a comparison of RDF benchmarks and real RDF datasets

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
ExpLOD: summary-based exploration of interlinking and RDF usage in the linked open data cloud

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
Query preserving graph compression

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Efficient external-memory bisimulation on DAGs

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
SchemEX - Efficient construction of a data catalogue by stream-based indexing of linked data

Web Semantics: Science, Services and Agents on the World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

RDF datasets with billions of triples are no longer unusual and continue to grow constantly (e.g. LOD cloud) driven by the inherent flexibility of RDF that allows to represent very diverse datasets, ranging from highly structured to unstructured data. Because of their size, understanding and processing RDF graphs is often a difficult task and methods to reduce the size while keeping as much of its structural information become attractive. In this paper we study bisimulation as a means to reduce the size of RDF graphs according to structural equivalence. We study two bisimulation algorithms, one for sequential execution using SQL and one for distributed execution using MapReduce. We demonstrate that the MapReduce-based implementation scales linearly with the number of the RDF triples, allowing to compute the bisimulation of very large RDF graphs within a time which is by far not possible for the sequential version. Experiments based on synthetic benchmark data and real data (DBPedia) exhibit a reduction of more than 90% of the size of the RDF graph in terms of the number of nodes to the number of blocks in the resulting bisimulation partition.