A survey of data provenance in e-science
ACM SIGMOD Record
The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming and Delivering Data
Ontology Matching
Intensional associations between data and metadata
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Semantic Provenance for eScience: Managing the Deluge of Scientific Data
IEEE Internet Computing
Towards a Taxonomy of Provenance in Scientific Workflow Management Systems
SERVICES '09 Proceedings of the 2009 Congress on Services - I
Web Semantics: Science, Services and Agents on the World Wide Web
Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration
Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration
Ontology Alignment: Bridging the Semantic Gap
Ontology Alignment: Bridging the Semantic Gap
The Open Provenance Model core specification (v1.1)
Future Generation Computer Systems
Tracing the provenance of linked data using voiD
Proceedings of the International Conference on Web Intelligence, Mining and Semantics
Linked Data
Sieve: linked data quality assessment and fusion
Proceedings of the 2012 Joint EDBT/ICDT Workshops
Managing the life-cycle of linked data with the LOD2 stack
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part II
Data Linking for the Semantic Web
International Journal on Semantic Web & Information Systems
Hi-index | 0.00 |
The Web of Data has emerged as a means to expose, share, reuse, and connect information on the Web identified by URIs using RDF as a data model, following Linked Data Principles. However, the reuse of third party data can be compromised without proper data quality assessments. In this context, important questions emerge: how can one trust on published data and links? Which manipulation, modification and integration operations have been applied to the data before its publication? What is the nature of comparisons or transformations applied to data during the interlinking process? In this scenario, provenance becomes a fundamental element. In this paper, we describe an approach for generating and capturing Linked Open Provenance (LOP) to support data quality and trustworthiness assessments, which covers preparation and format transformation of traditional data sources, up to dataset publication and interlinking. The proposed architecture takes advantage of provenance agents, orchestrated by an ETL workflow approach, to collect provenance at any specified level and also link it with its corresponding data. We also describe a real use case scenario where the architecture was implemented to evaluate the proposal.