Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A statistical profile of the Named Entity task
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Location normalization for information extraction
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Language model based arabic word segmentation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Unsupervised personal name disambiguation
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Improving name tagging by reference resolution and relation detection
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Named entity normalization in user generated content
Proceedings of the second workshop on Analytics for noisy unstructured text data
The impact of named entity normalization on information retrieval for question answering
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Entity clustering across languages
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Joint inference of named entity recognition and normalization for tweets
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Hi-index | 0.00 |
This paper presents a machine learning approach based on an SVM classifier coupled with preprocessing rules for cross-document named entity normalization. The classifier uses lexical, orthographic, phonetic, and morphological features. The process involves disambiguating different entities with shared name mentions and normalizing identical entities with different name mentions. In evaluating the quality of the clusters, the reported approach achieves a cluster F-measure of 0.93. The approach is significantly better than the two baseline approaches in which none of the entities are normalized or entities with exact name mentions are normalized. The two baseline approaches achieve cluster F-measures of 0.62 and 0.74 respectively. The classifier properly normalizes the vast majority of entities that are misnormalized by the baseline system.