Automatic discovery of high-level provenance using semantic similarity

Authors:
Tom De Nies;Sam Coppens;Davy Van Deursen;Erik Mannens;Rik Van de Walle
Affiliations:
Department of Electronics and Information Systems, Multimedia Lab, Ghent University - IBBT, Ledeberg-Ghent, Belgium;Department of Electronics and Information Systems, Multimedia Lab, Ghent University - IBBT, Ledeberg-Ghent, Belgium;Department of Electronics and Information Systems, Multimedia Lab, Ghent University - IBBT, Ledeberg-Ghent, Belgium;Department of Electronics and Information Systems, Multimedia Lab, Ghent University - IBBT, Ledeberg-Ghent, Belgium;Department of Electronics and Information Systems, Multimedia Lab, Ghent University - IBBT, Ledeberg-Ghent, Belgium
Venue:
IPAW'12 Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
Year:
2012

Citing 6
Cited 1

Problem-Solving Methods for Understanding Process Executions

Computing in Science and Engineering
Clustering with Lower Bound on Similarity

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Lost source provenance

Proceedings of the 13th International Conference on Extending Database Technology
Extending Semantic Provenance into the Web of Data

IEEE Internet Computing
Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering

ICSC '11 Proceedings of the 2011 IEEE Fifth International Conference on Semantic Computing
Issues in automatic provenance collection

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data

Reconstructing provenance

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

As interest in provenance grows among the Semantic Web community, it is recognized as a useful tool across many domains. However, existing automatic provenance collection techniques are not universally applicable. Most existing methods either rely on (low-level) observed provenance, or require that the user discloses formal workflows. In this paper, we propose a new approach for automatic discovery of provenance, at multiple levels of granularity. To accomplish this, we detect entity derivations, relying on clustering algorithms, linked data and semantic similarity. The resulting derivations are structured in compliance with the Provenance Data Model (PROV-DM). While the proposed approach is purposely kept general, allowing adaptation in many use cases, we provide an implementation for one of these use cases, namely discovering the sources of news articles. With this implementation, we were able to detect 73% of the original sources of 410 news stories, at 68% precision. Lastly, we discuss possible improvements and future work.