Entity Resolution in Texts Using Statistical Learning and Ontologies

Authors:
Tadej Štajner;Dunja Mladenić
Affiliations:
Jožef Stefan Institute, Ljubljana, Slovenia 1000;Jožef Stefan Institute, Ljubljana, Slovenia 1000
Venue:
ASWC '09 Proceedings of the 4th Asian Conference on The Semantic Web
Year:
2009

Citing 19
Cited 0

Word association norms, mutual information, and lexicography

Computational Linguistics
Foundations of statistical natural language processing

Foundations of statistical natural language processing
A vector space model for automatic indexing

Communications of the ACM
Learning object identification rules for information integration

Information Systems - Data extraction, cleaning and reconciliation
The Semantic Web: The Roles of XML and RDF

IEEE Internet Computing
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Word clustering and disambiguation based on co-occurrence data

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Using mutual information to resolve query translation ambiguities and query term weighting

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Semantic integration in text: from ambiguous names to identifiable entities

AI Magazine - Special issue on semantic integration
Information Extraction: Distilling Structured Data from Unstructured Text

Queue - Social Computing
Discovering informative connection subgraphs in multi-relational graphs

ACM SIGKDD Explorations Newsletter
Entity Resolution with Markov Logic

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Unsupervised large-vocabulary word sense disambiguation with graph-based algorithms for sequence data labeling

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Adaptive graphical approach to entity resolution

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
SOFIE: a self-organizing framework for information extraction

Proceedings of the 18th international conference on World wide web
Integrating co-occurrence statistics with information extraction for robust retrieval of protein interactions from Medline

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ambiguities, which are inherently present in natural languages represent a challenge of determining the actual identities of entities mentioned in a document (e.g., Paris can refer to a city in France but it can also refer to a small city in Texas, USA or to a 1984 film directed by Wim Wenders having title Paris, Texas). Disambiguation is a problem that can be successfully solved by entity resolution methods.This paper studies various methods for estimating relatedness between entities, used in collective entity resolution. We define a unified entity resolution approach, capable of using implicit as well as explicit relatedness for collectively identifying in-text entities. As a relatedness measure, we propose a method, which expresses relatedness using the heterogeneous relations of a domain ontology. We also experiment with other relatedness measures, such as using statistical learning of co-occurrences of two entities or using content similarity between them. Evaluation on real data shows that the new methods for relatedness estimation give good results.