Named Entity Recognition Experiments on Turkish Texts

Authors:
Dilek Küçük;Adnan Yazıcı
Affiliations:
Power Electronics Group, TÜBİTAK - Uzay Institute, Ankara, Turkey 06531;Department of Computer Engineering, Middle East Technical University, Ankara, Turkey 06531
Venue:
FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
Year:
2009

Citing 5
Cited 6

Multimedia indexing through multi-source and multi-language information extraction: the MUMIS project

Data & Knowledge Engineering - NLDB2002
A statistical information extraction system for Turkish

Natural Language Engineering
Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Web-assisted annotation, semantic indexing and search of television and radio news

WWW '05 Proceedings of the 14th international conference on World Wide Web
RitroveRAI: a web application for semantic indexing and hyperlinking of multimedia news

ISWC'05 Proceedings of the 4th international conference on The Semantic Web

Automatic rule learning exploiting morphological features for named entity recognition in Turkish

Journal of Information Science
Exploiting information extraction techniques for automatic semantic video indexing with an application to Turkish news videos

Knowledge-Based Systems
Exploiting morphology in Turkish named entity recognition system

HLT-SS '11 Proceedings of the ACL 2011 Student Session
A hybrid named entity recognizer for Turkish

Expert Systems with Applications: An International Journal
Multilingual video indexing and retrieval employing an information extraction tool for turkish news texts: a case study

FQAS'11 Proceedings of the 9th international conference on Flexible Query Answering Systems
A semi-automatic text-based semantic video annotation system for Turkish facilitating multilingual retrieval

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Named entity recognition (NER) is one of the main information extraction tasks and research on NER from Turkish texts is known to be rare. In this study, we present a rule-based NER system for Turkish which employs a set of lexical resources and pattern bases for the extraction of named entities including the names of people, locations, organizations together with time/date and money/percentage expressions. The domain of the system is news texts and it does not utilize important clues of capitalization and punctuation since they may be missing in texts obtained from the Web or the output of automatic speech recognition tools. The evaluation of the system is performed on news texts along with other genres encompassing child stories and historical texts, but as expected in case of manually engineered rule-based systems, it suffers from performance degradation on these latter genres of texts since they are distinct from the target domain of news texts. Furthermore, the system is evaluated on transcriptions of news videos leading to satisfactory results which is an important step towards the employment of NER during automatic semantic annotation of videos in Turkish. The current study is significant for its being the first rule-based approach to the NER task on Turkish texts with its evaluation on diverse text types.