Identity uncertainty

Authors:
Hanna Maria Pasula;Stuart J. Russell
Affiliations:
-;-
Venue:
Identity uncertainty
Year:
2003

Citing 0
Cited 7

Link mining: a survey

ACM SIGKDD Explorations Newsletter
Learning metadata from the evidence in an on-line citation matching scheme

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Query-time entity resolution

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Automated compilation of Object-Oriented Probabilistic Relational Models

International Journal of Approximate Reasoning
HARRA: fast iterative hashed record linkage for large-scale data collections

Proceedings of the 13th International Conference on Extending Database Technology
Probabilistic reasoning techniques for the tactical military domain

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III
Entity resolution: theory, practice & open challenges

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.01

Visualization

Abstract

Recent work in AI has made clear the advantages to be derived from combining probability theory with the expressive power of first-order logic. One promising approach is based on the concept of possible worlds, where a probability measure is defined over the interpretations defined by a logical knowledge base. This approach has been successfully used to add probabilistic elements to representations based on semantic networks and logic programming. However, all of the representations developed to date have made the unique names assumption; they have assumed that the constants of a language uniquely identify each such object. This is not always reasonable, since objects in the real world are not usually labeled with easily observable unique identifiers. Often, there exists a great deal of uncertainty over the identity mappings of observed objects. This is what we term identity uncertainty, and it is a pervasive problem of real-world data analysis, occurring in numerous settings such as database merging, feature correspondence, and object tracking. We propose an extension to the possible world approaches, one where the uncertainty over the mapping from terms to objects is represented explicitly, by extending the language used to define the probability distribution over possible worlds. We show that this extended language does define a unique and consistent distribution. We also suggest an approximate inference method for use in this scenario. This method is based on Markov chain Monte Carlo, and we have applied it to several domains, including vehicle matching and citation clustering, with promising results.