Introduction to Bayesian Networks
Introduction to Bayesian Networks
Probabilistic Networks and Expert Systems
Probabilistic Networks and Expert Systems
Evaluating Aggregate Operations Over Imprecise Data
IEEE Transactions on Knowledge and Data Engineering
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A Bayesian decision model for cost optimal record matching
The VLDB Journal — The International Journal on Very Large Data Bases
Evaluating probabilistic queries over imprecise data
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
OLAP over uncertain and imprecise data
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Working Models for Uncertain Data
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Record linkage: similarity measures and algorithms
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Efficient allocation algorithms for OLAP over imprecise data
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient join processing over uncertain data
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Estimating statistical aggregates on probabilistic data streams
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Maximally joining probabilistic data
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
OLAP over imprecise data with domain constraints
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Data integration with uncertainty
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Exploiting shared correlations in probabilistic databases
Proceedings of the VLDB Endowment
Ranking Queries on Uncertain Data
Ranking Queries on Uncertain Data
Hi-index | 0.00 |
Record linkage analysis, which matches records referring to the same real world entities from different data sets, is an important task in data integration. Uncertainty often exists in record linkages due to incompleteness or ambiguity in data. Fortunately, the state-of-the-art probabilistic record linkage methods are capable of computing the probability that two records referring to the same entity. In this paper, we study the novel aggregate queries on probabilistic record linkages, such as counting the number of matched records. We address several fundamental issues. First, we advocate that the answer to an aggregate query on probabilistic record linkages is a probability distribution of possible answers derived from possible worlds. Second, we identify the category of compatible linkages only on which the answers to aggregate queries can be determined properly when the probabilities of individual linkages are available but the joint distributions of multiple linkages are unavailable. Third, we give a quadratic exact algorithm and two approximation algorithms to answer aggregate queries.