Representing uncertain data: models, properties, and algorithms

Authors:
Anish Das Sarma;Omar Benjelloun;Alon Halevy;Shubha Nabar;Jennifer Widom
Affiliations:
Stanford University, Stanford, USA;Google Inc., Mountain View, USA;Google Inc., Mountain View, USA;Microsoft Corp, Redmond, USA;Stanford University, Stanford, USA
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2009

Citing 44
Cited 14

Incomplete Information in Relational Databases

Journal of the ACM (JACM)
Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Horn tables-an efficient tool for handling incomplete information in databases

PODS '89 Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Indefinite and maybe information in relational databases

ACM Transactions on Database Systems (TODS)
On the representation and querying of sets of possible worlds

Selected papers of the workshop on Deductive database theory
Incomplete object—a data model for design and planning applications

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Semantic representations and query languages for or-sets

PODS '93 Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Knowledge compilation and theory approximation

Journal of the ACM (JACM)
A probabilistic relational algebra for the integration of information retrieval and database systems

ACM Transactions on Information Systems (TOIS)
ProbView: a flexible probabilistic database system

ACM Transactions on Database Systems (TODS)
Consistent query answers in inconsistent databases

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Extending the database relational model to capture more meaning

ACM Transactions on Database Systems (TODS)
Querying logical databases

PODS '85 Proceedings of the fourth ACM SIGACT-SIGMOD symposium on Principles of database systems
Foundations of Databases: The Logical Level

Foundations of Databases: The Logical Level
The XXL search engine: ranked retrieval of XML data using indexes and ontologies

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Resolving Database Incompatibility: An Approach to Performing Relational Operations over Mismatched Domains

IEEE Transactions on Knowledge and Data Engineering
The Management of Probabilistic Data

IEEE Transactions on Knowledge and Data Engineering
Flexible Relation: An Approach for Integrating Data from Multiple, Possibly Inconsistent Databases

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Condensed Representation of Database Repairs for Consistent Query Answering

ICDT '03 Proceedings of the 9th International Conference on Database Theory
Dependency Satisfaction in Databases with Incomplete Information

VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
The Theory of Probabilistic Databases

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
A Probabilistic Framework for Vague Queries and Imprecise Information in Databases

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
An Extended Relational Database Model for Uncertain and Imprecise Information

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Query Answering in Information Systems with Integrity Constraints

Proceedings of the IFIP TC11 Working Group 11.5, First Working Conference on Integrity and Internal Control in Information Systems: Increasing the confidence in Information Systems
On the decidability and complexity of query answering over inconsistent and incomplete databases

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Integrating data from possibly inconsistent databases

COOPIS '96 Proceedings of the First IFCIS International Conference on Cooperative Information Systems
Polynomial approximation and graph-coloring

Computing
A Logical Framework for Querying and Repairing Inconsistent Databases

IEEE Transactions on Knowledge and Data Engineering
Answer sets for consistent query answering in inconsistent databases

Theory and Practice of Logic Programming
MYSTIQ: a system for finding more answers by using probabilities

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Answering queries from statistics and probabilistic views

VLDB '05 Proceedings of the 31st international conference on Very large data bases
U-DBMS: a database system for managing constantly-evolving data

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Working Models for Uncertain Data

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
ULDBs: databases with uncertainty and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
OLAP over uncertain and imprecise data

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient query evaluation on probabilistic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Materialized views in probabilistic databases: for information exchange and query optimization

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
MCDB: a monte carlo approach to managing uncertain data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
BayesStore: managing large, uncertain data repositories with probabilistic graphical models

Proceedings of the VLDB Endowment
Learning probabilistic relational models

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Dynamic probabilistic relational models

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
World-set decompositions: expressiveness and efficient algorithms

ICDT'07 Proceedings of the 11th international conference on Database Theory
Asymptotic conditional probabilities for conjunctive queries

ICDT'05 Proceedings of the 10th international conference on Database Theory
Models for incomplete and probabilistic information

EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology

DUST: a generalized notion of similarity between uncertain time series

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
DUST: a generalized notion of similarity between uncertain time series

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Combining intensional with extensional query evaluation in tuple independent probabilistic databases

Information Sciences: an International Journal
MCDB-R: risk analysis in the database

Proceedings of the VLDB Endowment
The monte carlo database system: Stochastic analysis close to the data

ACM Transactions on Database Systems (TODS)
Uncertain centroid based partitional clustering of uncertain data

Proceedings of the VLDB Endowment
Provenance based conflict handling strategies

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications
Probabilistic query answering over inconsistent databases

Annals of Mathematics and Artificial Intelligence
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases
On the foundations of probabilistic information integration

Proceedings of the 21st ACM international conference on Information and knowledge management
Range counting coresets for uncertain data

Proceedings of the twenty-ninth annual symposium on Computational geometry
Towards high-throughput gibbs sampling at scale: a study across storage managers

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
FARP: Mining fuzzy association rules from a probabilistic quantitative database

Information Sciences: an International Journal
A compact representation for efficient uncertain-information integration

Proceedings of the 17th International Database Engineering & Applications Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

In general terms, an uncertain relation encodes a set of possible certain relations. There are many ways to represent uncertainty, ranging from alternative values for attributes to rich constraint languages. Among the possible models for uncertain data, there is a tension between simple and intuitive models, which tend to be incomplete, and complete models, which tend to be nonintuitive and more complex than necessary for many applications. We present a space of models for representing uncertain data based on a variety of uncertainty constructs and tuple-existence constraints. We explore a number of properties and results for these models. We study completeness of the models, as well as closure under relational operations, and we give results relating closure and completeness. We then examine whether different models guarantee unique representations of uncertain data, and for those models that do not, we provide complexity results and algorithms for testing equivalence of representations. The next problem we consider is that of minimizing the size of representation of models, showing that minimizing the number of tuples also minimizes the size of constraints. We show that minimization is intractable in general and study the more restricted problem of maintaining minimality incrementally when performing operations. Finally, we present several results on the problem of approximating uncertain data in an insufficiently expressive model.