Incomplete Information in Relational Databases
Journal of the ACM (JACM)
Computational limitations of small-depth circuits
Computational limitations of small-depth circuits
A logic for reasoning about probabilities
Information and Computation - Selections from 1988 IEEE symposium on logic in computer science
Learning decision trees using the Fourier spectrum
SIAM Journal on Computing
Constant depth circuits, Fourier transform, and learnability
Journal of the ACM (JACM)
Weakly learning DNF and characterizing statistical query learning using Fourier analysis
STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Randomized algorithms
The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
On the Fourier spectrum of monotone functions
Journal of the ACM (JACM)
A probabilistic relational algebra for the integration of information retrieval and database systems
ACM Transactions on Information Systems (TOIS)
The complexity of query reliability
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate computation of multidimensional aggregates of sparse data using wavelets
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Causality: models, reasoning, and inference
Causality: models, reasoning, and inference
An Introduction to Variational Methods for Graphical Models
Machine Learning
SPARTAN: a model-based semantic compression system for massive data tables
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity estimation using probabilistic models
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Foundations of Databases: The Logical Level
Foundations of Databases: The Logical Level
Probabilistic Networks and Expert Systems
Probabilistic Networks and Expert Systems
Compressing Relations and Indexes
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Computational applications of noise sensitivity
Computational applications of noise sensitivity
Probabilistic wavelet synopses
ACM Transactions on Database Systems (TODS)
A Switching Lemma for Small Restrictions and Lower Bounds for k-DNF Resolution
SIAM Journal on Computing
On learning monotone DNF under product distributions
Information and Computation
MYSTIQ: a system for finding more answers by using probabilities
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
C-store: a column-oriented DBMS
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Provenance management in curated databases
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A formal analysis of information disclosure in data exchange
Journal of Computer and System Sciences
Extended wavelets for multiple measures
ACM Transactions on Database Systems (TODS)
ORCHESTRA: facilitating collaborative data sharing
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient query evaluation on probabilistic databases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Materialized views in probabilistic databases: for information exchange and query optimization
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Databases with uncertainty and lineage
The VLDB Journal — The International Journal on Very Large Data Bases
Causes and explanations: a structural-model approach-part II: explanations
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Managing Probabilistic Data with MystiQ: The Can-Do, the Could-Do, and the Can't-Do
SUM '08 Proceedings of the 2nd international conference on Scalable Uncertainty Management
Provenance in Databases: Why, How, and Where
Foundations and Trends in Databases
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Lineage processing over correlated probabilistic databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
I4E: interactive investigation of iterative information extraction
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
ADBIS'10 Proceedings of the 14th east European conference on Advances in databases and information systems
The Foundations for Provenance on the Web
Foundations and Trends in Web Science
Schema-as-you-go: on probabilistic tagging and querying of wide tables
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Sensitivity analysis and explanations for robust query evaluation in probabilistic databases
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Probabilistic management of OCR data using an RDBMS
Proceedings of the VLDB Endowment
ACM Transactions on Database Systems (TODS)
A top-k filter for logic-based similarity conditions on probabilistic databases
ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
International Journal of Systems and Service-Oriented Engineering
Distributed time-aware provenance
Proceedings of the VLDB Endowment
Towards design support for provenance awareness: a classification of provenance questions
Proceedings of the Joint EDBT/ICDT 2013 Workshops
Local clustering in provenance graphs
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Scorpion: explaining away outliers in aggregate queries
Proceedings of the VLDB Endowment
Anytime approximation in probabilistic databases
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
In probabilistic databases, lineage is fundamental to both query processing and understanding the data. Current systems s.a. Trio or Mystiq use a complete approach in which the lineage for a tuple t is a Boolean formula which represents all derivations of t. In large databases lineage formulas can become huge: in one public database (the Gene Ontology) we often observed 10MB of lineage (provenance) data for a single tuple. In this paper we propose to use approximate lineage, which is a much smaller formula keeping track of only the most important derivations, which the system can use to process queries and provide explanations. We discuss in detail two specific kinds of approximate lineage: (1) a conservative approximation called sufficient lineage that records the most important derivations for each tuple, and (2) polynomial lineage, which is more aggressive and can provide higher compression ratios, and which is based on Fourier approximations of Boolean expressions. In this paper we define approximate lineage formally, describe algorithms to compute approximate lineage and prove formally their error bounds, and validate our approach experimentally on a real data set.