RDF-4G: algorithmic building blocks for large-scale graph analytics

Authors:
Stephan Seufert
Affiliations:
Max Planck Institute for Informatics, Saarbrücken, Germany
Venue:
Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
Year:
2013

Citing 6
Cited 0

Efficient management of transitive relationships in large data and knowledge bases

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
A sketch-based distance oracle for web-scale graphs

Proceedings of the third ACM international conference on Web search and data mining
The RDF-3X engine for scalable management of RDF data

The VLDB Journal — The International Journal on Very Large Data Bases
Fast and accurate estimation of shortest paths in large graphs

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Bonsai: Growing Interesting Small Trees

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
GRAIL: a scalable index for reachability queries in very large graphs

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present RDF-4G, the first three miles towards a large-scale graph-analytics engine built on top of the state-of-the-art RDF engine, RDF-3X. The algorithmic building blocks that make up this work help answering fundamental questions about relationships between entities in a graph-structured world. More precisely, our system provides insights into what we define as the trilogy of relationship analyis: Is there a relationship between entities? Who participates in the connection? How can the relationship be characterized? While the first two questions correspond to the algorithmic primitives of graph processing, reachability and shortest path queries, for answering the third question we propose a novel graph-theoretic concept, relatedness cores. The technical contributions we make in this work are efficient index structures for reachability and shortest path query processing together with a new notion of and algorithms for relationship characterization. The latter can be efficiently computed based on the techniques we have developed in our work on graph indexing. All our methods are integrated into the RDF-3X engine, the state-of-the-art system for querying RDF-structured data. Future work includes the exposure of our algorithmic building blocks to the user, via extensions to the de-facto standard query language for graph-structured data, SPARQL.