Arabic cross-document person name normalization

Authors:
Walid Magdy;Kareem Darwish;Ossama Emam;Hany Hassan
Affiliations:
IBM Cairo Technology Development Center, Giza, Egypt;IBM Cairo Technology Development Center, Giza, Egypt;IBM Cairo Technology Development Center, Giza, Egypt;IBM Cairo Technology Development Center, Giza, Egypt
Venue:
Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
Year:
2007

Citing 6
Cited 4

Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A statistical profile of the Named Entity task

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Location normalization for information extraction

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Language model based arabic word segmentation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Unsupervised personal name disambiguation

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Improving name tagging by reference resolution and relation detection

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics

Named entity normalization in user generated content

Proceedings of the second workshop on Analytics for noisy unstructured text data
The impact of named entity normalization on information retrieval for question answering

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Entity clustering across languages

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Joint inference of named entity recognition and normalization for tweets

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a machine learning approach based on an SVM classifier coupled with preprocessing rules for cross-document named entity normalization. The classifier uses lexical, orthographic, phonetic, and morphological features. The process involves disambiguating different entities with shared name mentions and normalizing identical entities with different name mentions. In evaluating the quality of the clusters, the reported approach achieves a cluster F-measure of 0.93. The approach is significantly better than the two baseline approaches in which none of the entities are normalized or entities with exact name mentions are normalized. The two baseline approaches achieve cluster F-measures of 0.62 and 0.74 respectively. The classifier properly normalizes the vast majority of entities that are misnormalized by the baseline system.