Entity Resolution in Texts Using Statistical Learning and Ontologies

  • Authors:
  • Tadej Štajner;Dunja Mladenić

  • Affiliations:
  • Jožef Stefan Institute, Ljubljana, Slovenia 1000;Jožef Stefan Institute, Ljubljana, Slovenia 1000

  • Venue:
  • ASWC '09 Proceedings of the 4th Asian Conference on The Semantic Web
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Ambiguities, which are inherently present in natural languages represent a challenge of determining the actual identities of entities mentioned in a document (e.g., Paris can refer to a city in France but it can also refer to a small city in Texas, USA or to a 1984 film directed by Wim Wenders having title Paris, Texas). Disambiguation is a problem that can be successfully solved by entity resolution methods.This paper studies various methods for estimating relatedness between entities, used in collective entity resolution. We define a unified entity resolution approach, capable of using implicit as well as explicit relatedness for collectively identifying in-text entities. As a relatedness measure, we propose a method, which expresses relatedness using the heterogeneous relations of a domain ontology. We also experiment with other relatedness measures, such as using statistical learning of co-occurrences of two entities or using content similarity between them. Evaluation on real data shows that the new methods for relatedness estimation give good results.