Foundations of statistical natural language processing
Foundations of statistical natural language processing
The Journal of Machine Learning Research
Automatic word sense discrimination
Computational Linguistics - Special issue on word sense disambiguation
Entity-based cross-document coreferencing using the Vector Space Model
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Learning surface text patterns for a Question Answering system
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Person resolution in person search results: WebHawk
Proceedings of the 14th ACM international conference on Information and knowledge management
Unsupervised personal name disambiguation
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Using a knowledge base to disambiguate personal name in web search results
Proceedings of the 2007 ACM symposium on Applied computing
Improving the performance of personal name disambiguation using web directories
Information Processing and Management: an International Journal
The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
CU-COMSEM: exploring rich features for unsupervised web personal name disambiguation
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Name discrimination by clustering similar contexts
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Hi-index | 0.00 |
Search results for personal name queries often contain documents relevant to several people as a personal name is often shared by several people. In order to differentiate people in these search results, it is required to extract contexts relevant to people in documents. However, since web documents are noisy and the texts related to people might be short, it is difficult to extract contexts of people effectively. We propose a new method that uses web directories as additional information in order to recognize topic terms in documents more easily and to extract contexts of people more effectively. First, we apply latent Dirichlet allocation method to extract latent topics in web directories. Then, the extracted topics are used to recognize topics contained in name ambiguity documents so that common context measurements can be calculated more effectively. Our experiments, conducted with documents of real people in the web and several well-known web directories, show that our approach disambiguates personal names better than some other conventional approaches like vector space model approach and named entity recognition approach.