Named Entity Recognition Experiments on Turkish Texts

  • Authors:
  • Dilek Küçük;Adnan Yazıcı

  • Affiliations:
  • Power Electronics Group, TÜBİTAK - Uzay Institute, Ankara, Turkey 06531;Department of Computer Engineering, Middle East Technical University, Ankara, Turkey 06531

  • Venue:
  • FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Named entity recognition (NER) is one of the main information extraction tasks and research on NER from Turkish texts is known to be rare. In this study, we present a rule-based NER system for Turkish which employs a set of lexical resources and pattern bases for the extraction of named entities including the names of people, locations, organizations together with time/date and money/percentage expressions. The domain of the system is news texts and it does not utilize important clues of capitalization and punctuation since they may be missing in texts obtained from the Web or the output of automatic speech recognition tools. The evaluation of the system is performed on news texts along with other genres encompassing child stories and historical texts, but as expected in case of manually engineered rule-based systems, it suffers from performance degradation on these latter genres of texts since they are distinct from the target domain of news texts. Furthermore, the system is evaluated on transcriptions of news videos leading to satisfactory results which is an important step towards the employment of NER during automatic semantic annotation of videos in Turkish. The current study is significant for its being the first rule-based approach to the NER task on Turkish texts with its evaluation on diverse text types.