Efficiency and precision trade-offs in graph summary algorithms

Authors:
Stéphane Campinas;Renaud Delbru;Giovanni Tummarello
Affiliations:
National University of Ireland, Galway;National University of Ireland, Galway;National University of Ireland, Galway
Venue:
Proceedings of the 17th International Database Engineering & Applications Symposium
Year:
2013

Citing 18
Cited 0

Three partition refinement algorithms

SIAM Journal on Computing
An implementation of an efficient algorithm for bisimulation equivalence

Science of Computer Programming
Communication and Concurrency

Communication and Concurrency
Covering indexes for branching path queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Object Exchange Across Heterogeneous Information Sources

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Querying Semi-Structured Data

ICDT '97 Proceedings of the 6th International Conference on Database Theory
Index Structures for Path Expressions

ICDT '99 Proceedings of the 7th International Conference on Database Theory
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Concurrency and Automata on Infinite Sequences

Proceedings of the 5th GI-Conference on Theoretical Computer Science
D(k)-index: an adaptive structural summary for graph-structured data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Graph summarization with bounded error

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient aggregation for graph summarization

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
ExpLOD: summary-based exploration of interlinking and RDF usage in the linked open data cloud

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
A Query Formulation Language for the Data Web

IEEE Transactions on Knowledge and Data Engineering
Introducing RDF Graph Summary with Application to Assisted SPARQL Formulation

DEXA '12 Proceedings of the 2012 23rd International Workshop on Database and Expert Systems Applications
Structure inference for linked data sources using clustering

Proceedings of the Joint EDBT/ICDT 2013 Workshops
Bisimulation reduction of big graphs on mapreduce

BNCOD'13 Proceedings of the 29th British National conference on Big Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many applications, it is convenient to substitute a large data graph with a smaller homomorphic graph. This paper investigates approaches for summarising massive data graphs. In general, massive data graphs are processed using a shared-nothing infrastructure such as MapReduce. However, accurate graph summarisation algorithms are suboptimal for this kind of environment as they require multiple iterations over the data graph. We investigate approximate graph summarisation algorithms that are efficient to compute in a shared-nothing infrastructure. We define a quality assessment model of a summary with regards to a gold standard summary. We evaluate over several datasets the trade-offs between efficiency and precision of the algorithms. With regards to an application, experiments highlight the need to trade-off the precision and volume of a graph summary with the complexity of a summarisation technique.