Towards correcting input data errors probabilistically using integrity constraints
MobiDE '06 Proceedings of the 5th ACM international workshop on Data engineering for wireless and mobile access
MauveDB: supporting model-based user views in database systems
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
From complete to incomplete information and back
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Management of probabilistic data: foundations and challenges
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
OLAP over imprecise data with domain constraints
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Materialized views in probabilistic databases: for information exchange and query optimization
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Query processing over incomplete autonomous databases
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A three-valued semantics for querying and repairing inconsistent databases
Annals of Mathematics and Artificial Intelligence
MCDB: a monte carlo approach to managing uncertain data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Query evaluation with soft-key constraints
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Dependencies revisited for improving data quality
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Probabilistic top-k and ranking-aggregate queries
ACM Transactions on Database Systems (TODS)
ACM SIGACT News
World-set decompositions: Expressiveness and efficient algorithms
Theoretical Computer Science
Interactive source registration in community-oriented information integration
Proceedings of the VLDB Endowment
Conditioning probabilistic databases
Proceedings of the VLDB Endowment
Cleaning uncertain data with quality guarantees
Proceedings of the VLDB Endowment
Exploiting shared correlations in probabilistic databases
Proceedings of the VLDB Endowment
Systems aspects of probabilistic data management
Proceedings of the VLDB Endowment
Approximate Probabilistic Query Answering over Inconsistent Databases
ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
A compositional framework for complex queries over uncertain data
Proceedings of the 12th International Conference on Database Theory
Efficient top-k count queries over imprecise duplicates
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Probabilistic databases: diamonds in the dirt
Communications of the ACM - Barbara Liskov: ACM's A.M. Turing Award Winner
Consensus answers for queries over probabilistic databases
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Indexing correlated probabilistic databases
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
The trichotomy of HAVING queries on a probabilistic database
The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases
Creating probabilistic databases from duplicated data
The VLDB Journal — The International Journal on Very Large Data Bases
PrDB: managing and exploiting rich correlations in probabilistic databases
The VLDB Journal — The International Journal on Very Large Data Bases
A unified approach to ranking in probabilistic databases
Proceedings of the VLDB Endowment
Modeling and querying possible repairs in duplicate detection
Proceedings of the VLDB Endowment
Entity-aware query processing for heterogeneous data with uncertainty and correlations
Proceedings of the 2009 EDBT/ICDT Workshops
Enabling entity-based aggregators for web 2.0 data
Proceedings of the 19th international conference on World wide web
Querying and repairing inconsistent databases under three-valued semantics
ICLP'07 Proceedings of the 23rd international conference on Logic programming
Computing a k-route over uncertain geographical data
SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Leveraging spatio-temporal redundancy for RFID data cleansing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
ERACER: a database approach for statistical inference and data cleaning
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
GRN model of probabilistic databases: construction, transition and querying
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Consistent query answers in inconsistent probabilistic databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
DCUBE: CUBE on dirty databases
WAIM'10 Proceedings of the 11th international conference on Web-age information management
On-the-fly entity-aware query processing in the presence of linkage
Proceedings of the VLDB Endowment
Scalable probabilistic databases with factor graphs and MCMC
Proceedings of the VLDB Endowment
Explore or exploit?: effective strategies for disambiguating large databases
Proceedings of the VLDB Endowment
Tractability in probabilistic databases
Proceedings of the 14th International Conference on Database Theory
Annotation based query answer over inconsistent database
Journal of Computer Science and Technology
Queries and materialized views on probabilistic databases
Journal of Computer and System Sciences
Proceedings of the 4th International Workshop on Logic in Databases
A unified approach to ranking in probabilistic databases
The VLDB Journal — The International Journal on Very Large Data Bases
Querying uncertain data with aggregate constraints
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
LinkDB: a probabilistic linkage database system
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
The monte carlo database system: Stochastic analysis close to the data
ACM Transactions on Database Systems (TODS)
Incorporating domain knowledge and user expertise in probabilistic Tuple merging
SUM'11 Proceedings of the 5th international conference on Scalable uncertainty management
Cost-efficient repair in inconsistent probabilistic databases
Proceedings of the 20th ACM international conference on Information and knowledge management
Scrubbing query results from probabilistic databases
Proceedings of the 15th Symposium on International Database Engineering & Applications
Consistent query answering: five easy pieces
ICDT'07 Proceedings of the 11th international conference on Database Theory
World-set decompositions: expressiveness and efficient algorithms
ICDT'07 Proceedings of the 11th international conference on Database Theory
Certain conjunctive query answering in first-order logic
ACM Transactions on Database Systems (TODS)
Prioritized repairing and consistent query answering in relational databases
Annals of Mathematics and Artificial Intelligence
Probabilistic query answering over inconsistent databases
Annals of Mathematics and Artificial Intelligence
A model of uncertainty for near-duplicates in document reference networks
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
A dichotomy in the complexity of counting database repairs
Journal of Computer and System Sciences
Real-time probabilistic data association over streams
Proceedings of the 7th ACM international conference on Distributed event-based systems
Hi-index | 0.00 |
The detection of duplicate tuples, corresponding to the same real-world entity, is an important task in data integration and cleaning. While many techniques exist to identify such tuples, the merging or elimination of duplicates can be a difficult task that relies on ad-hoc and often manual solutions. We propose a complementary approach that permits declarative query answering over duplicated data, where each duplicate is associated with a probability of being in the clean database. We rewrite queries over a database containing duplicates to return each answer with the probability that the answer is in the clean database. Our rewritten queries are sensitive to the semantics of duplication and help a user understand which query answers are most likely to be present in the clean database. The semantics that we adopt is independent of the way the probabilities are produced, but is able to effectively exploit them during query answering. In the absence of external knowledge that associates each database tuple with a probability, we offer a technique, based on tuple summaries, that automates this task. We experimentally study the performance of our rewritten queries. Our studies show that the rewriting does not introduce a significant overhead in query execution time. This work is done in the context of the ConQuer project at the University of Toronto, which focuses on the efficient management of inconsistent and dirty databases.