Automatic discovery of named entity variants: grammar-driven approaches to non-alphabetical transliterations

  • Authors:
  • Chu-Ren Huang;Petr Šimon;Shu-Kai Hsieh

  • Affiliations:
  • Institute of Linguistics, Academia Sinica, Taiwan;Institute of Linguistics, Academia Sinica, Taiwan;DoFLAL, NIU, Taiwan

  • Venue:
  • ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Identification of transliterated names is a particularly difficult task of Named Entity Recognition (NER), especially in the Chinese context. Of all possible variations of transliterated named entities, the difference between PRC and Taiwan is the most prevalent and most challenging. In this paper, we introduce a novel approach to the automatic extraction of diverging transliterations of foreign named entities by bootstrapping co-occurrence statistics from tagged and segmented Chinese corpus. Preliminary experiment yields promising results and shows its potential in NLP applications.