Named Entity Disambiguation: A Hybrid Statistical and Rule-Based Incremental Approach

Authors:
Hien T. Nguyen;Tru H. Cao
Affiliations:
Ton Duc Thang University, Vietnam;Ho Chi Minh City University of Technology, Vietnam
Venue:
ASWC '08 Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
Year:
2008

Citing 12
Cited 5

Modern Information Retrieval

Modern Information Retrieval
Disambiguation of proper names in text

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Fine grained classification of named entities

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Bootstrapping toponym classifiers

HLT-NAACL-GEOREF '03 Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
Quantifying the accuracy of relational statements in Wikipedia: a methodology

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
The Semantic Web Revisited

IEEE Intelligent Systems
Geographic Named Entity Disambiguation with Automatic Profile Generation

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Geographic co-occurrence as a tool for gir.

Proceedings of the 4th ACM workshop on Geographical information retrieval
IdentityRank: Named Entity Disambiguation in the Context of the NEWS Project

ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
A knowledge-based approach to named entity disambiguation in news articles

AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Ontology-driven automatic entity disambiguation in unstructured text

ISWC'06 Proceedings of the 5th international conference on The Semantic Web

Enabling semantic search in a news production environment

SAMT'10 Proceedings of the 5th international conference on Semantic and digital media technologies
Linking the past: discovering historical social networks from documents and linking to a genealogical database

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Unsupervised name ambiguity resolution using a generative model

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
IdentityRank: Named entity disambiguation in the news domain

Expert Systems with Applications: An International Journal
Heuristics- and statistics-based wikification

PRICAI'12 Proceedings of the 12th Pacific Rim international conference on Trends in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The rapidly increasing use of large-scale data on the Web makes named entity disambiguation become one of the main challenges to research in Information Extraction and development of Semantic Web. This paper presents a novel method for detecting proper names in a text and linking them to the right entities in Wikipedia. The method is hybrid, containing two phases of which the first one utilizes some heuristics and patterns to narrow down the candidates, and the second one employs the vector space model to rank the ambiguous cases to choose the right candidate. The novelty is that the disambiguation process is incremental and includes several rounds that filter the candidates, by exploiting previously identified entities and extending the text by those entity attributes every time they are successfully resolved in a round. We test the performance of the proposed method in disambiguation of names of people, locations and organizations in texts of the news domain. The experiment results show that our approach achieves high accuracy and can be used to construct a robust named entity disambiguation system.