Querying data provenance

Authors:
Grigoris Karvounarakis;Zachary G. Ives;Val Tannen
Affiliations:
LogicBlox, Atlanta, GA, USA and ICS-FORTH, Heraklion, Greece;University of Pennsylvania, Philadelphia, PA, USA;University of Pennsylvania, Philadelphia, PA, USA
Venue:
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Year:
2010

Citing 34
Cited 24

Expressing structural hypertext queries in graphlog

HYPERTEXT '89 Proceedings of the second annual ACM conference on Hypertext
Access support in object bases

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Catching the boat with Strudel: experiences with a Web-site management system

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
RQL: a declarative query language for RDF

Proceedings of the 11th international conference on World Wide Web
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Automated Selection of Materialized Views and Indexes in SQL Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
A Fast Index for Semistructured Data

Proceedings of the 27th International Conference on Very Large Data Bases
Answering queries using views: A survey

The VLDB Journal — The International Journal on Very Large Data Bases
Lineage tracing in data warehouses

Lineage tracing in data warehouses
DBNotes: a post-it system for relational databases based on provenance

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Data exchange: semantics and query answering

Theoretical Computer Science - Database theory
MONDRIAN: Annotating and Querying Databases through Colors and Blocks

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Reconciling while tolerating disagreement in collaborative data sharing

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Provenance management in curated databases

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Debugging schema mappings with routes

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Querying business processes

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
ULDBs: databases with uncertainty and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Provenance semirings

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Translating web data

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Objectrank: authority-based keyword search in databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient query evaluation on probabilistic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
An annotation management system for relational databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Update exchange with mappings and provenance

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
BioGuideSRS

Bioinformatics
Efficient provenance storage

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Annotated XML: queries and provenance

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fable: A Language for Enforcing User-defined Security Policies

SP '08 Proceedings of the 2008 IEEE Symposium on Security and Privacy
Learning to create data-integrating queries

Proceedings of the VLDB Endowment
Querying and Managing Provenance through User Views in Scientific Workflows

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Perm: Processing Provenance and Data on the Same Data Model through Query Rewriting

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Cooperative update exchange in the Youtopia system

Proceedings of the VLDB Endowment
Integrating conflicting data: the role of source dependence

Proceedings of the VLDB Endowment
Provenance in collaborative data sharing

Provenance in collaborative data sharing
On the expressiveness of implicit provenance in query and update languages

ICDT'07 Proceedings of the 11th international conference on Database Theory

Towards a data-centric view of cloud security

CloudDB '10 Proceedings of the second international workshop on Cloud data management
On provenance and privacy

Proceedings of the 14th International Conference on Database Theory
Faster query answering in probabilistic databases using read-once functions

Proceedings of the 14th International Conference on Database Theory
Is provenance logical?

Proceedings of the 4th International Workshop on Logic in Databases
Provenance for aggregate queries

Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
NetTrails: a declarative platform for maintaining and querying provenance in distributed systems

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Lineage for Markovian stream event queries

Proceedings of the 10th ACM International Workshop on Data Engineering for Wireless and Mobile Access
Search, adapt, and reuse: the future of scientific workflows

ACM SIGMOD Record
Query language constructs for provenance

Proceedings of the 15th Symposium on International Database Engineering & Applications
Putting lipstick on pig: enabling database-style workflow provenance

Proceedings of the VLDB Endowment
A general framework for representing, reasoning and querying with annotated Semantic Web data

Web Semantics: Science, Services and Agents on the World Wide Web
It's about the data: provenance as a tool for assessing data fitness

TaPP'12 Proceedings of the 4th USENIX conference on Theory and Practice of Provenance
Querying provenance for ranking and recommending

TaPP'12 Proceedings of the 4th USENIX conference on Theory and Practice of Provenance
Semiring-annotated data: queries and provenance?

ACM SIGMOD Record
Efficient provenance storage for relational queries

Proceedings of the 21st ACM international conference on Information and knowledge management
LogicBlox, platform and language: a tutorial

Datalog 2.0'12 Proceedings of the Second international conference on Datalog in Academia and Industry
Algebraic structures for capturing the provenance of SPARQL queries

Proceedings of the 16th International Conference on Database Theory
Distributed time-aware provenance

Proceedings of the VLDB Endowment
The W3C PROV family of specifications for modelling provenance metadata

Proceedings of the 16th International Conference on Extending Database Technology
Collaborative data sharing via update exchange and provenance

ACM Transactions on Database Systems (TODS)
A new compression algorithm of data provenance based on self-adaptive granularity

International Journal of Computer Applications in Technology
Enabling the analysis of cross-cutting aspects in ad-hoc processes

CAiSE'13 Proceedings of the 25th international conference on Advanced Information Systems Engineering
The providence of provenance

BNCOD'13 Proceedings of the 29th British National conference on Big Data
TripleProv: efficient processing of lineage queries in a native RDF store

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many advanced data management operations (e.g., incremental maintenance, trust assessment, debugging schema mappings, keyword search over databases, or query answering in probabilistic databases), involve computations that look at how a tuple was produced, e.g., to determine its score or existence. This requires answers to queries such as, "Is this data derivable from trusted tuples?"; "What tuples are derived from this relation?"; or "What score should this answer receive, given initial scores of the base tuples?". Such questions can be answered by consulting the provenance of query results. In recent years there has been significant progress on formal models for provenance. However, the issues of provenance storage, maintenance, and querying have not yet been addressed in an application-independent way. In this paper, we adopt the most general formalism for tuple-based provenance, semiring provenance. We develop a query language for provenance, which can express all of the aforementioned types of queries, as well as many more; we propose storage, processing and indexing schemes for data provenance in support of these queries; and we experimentally validate the feasibility of provenance querying and the benefits of our indexing techniques across a variety of application classes and queries.