Name Disambiguation Boosted by Latent Topics from Web Directories

Authors:
Quang Minh Vu;Atsuhiro Takasu;Jun Adachi
Affiliations:
-;-;-
Venue:
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Year:
2008

Citing 12
Cited 0

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Latent dirichlet allocation

The Journal of Machine Learning Research
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Entity-based cross-document coreferencing using the Vector Space Model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Learning surface text patterns for a Question Answering system

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Person resolution in person search results: WebHawk

Proceedings of the 14th ACM international conference on Information and knowledge management
Unsupervised personal name disambiguation

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Using a knowledge base to disambiguate personal name in web search results

Proceedings of the 2007 ACM symposium on Applied computing
Improving the performance of personal name disambiguation using web directories

Information Processing and Management: an International Journal
The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
CU-COMSEM: exploring rich features for unsupervised web personal name disambiguation

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Name discrimination by clustering similar contexts

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Search results for personal name queries often contain documents relevant to several people as a personal name is often shared by several people. In order to differentiate people in these search results, it is required to extract contexts relevant to people in documents. However, since web documents are noisy and the texts related to people might be short, it is difficult to extract contexts of people effectively. We propose a new method that uses web directories as additional information in order to recognize topic terms in documents more easily and to extract contexts of people more effectively. First, we apply latent Dirichlet allocation method to extract latent topics in web directories. Then, the extracted topics are used to recognize topics contained in name ambiguity documents so that common context measurements can be calculated more effectively. Our experiments, conducted with documents of real people in the web and several well-known web directories, show that our approach disambiguates personal names better than some other conventional approaches like vector space model approach and named entity recognition approach.