Rule-based named entity recognition in Urdu

Authors:
Kashif Riaz
Affiliations:
University of Minnesota, MN
Venue:
NEWS '10 Proceedings of the 2010 Named Entities Workshop
Year:
2010

Citing 8
Cited 1

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A statistical profile of the Named Entity task

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Disambiguation of proper names in text

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Rapid development of Hindi named entity recognition using conditional random fields and feature induction

ACM Transactions on Asian Language Information Processing (TALIP)
A study in Urdu corpus construction

COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12
Concept search in Urdu

Proceedings of the 2nd PhD workshop on Information and knowledge management
NE tagging for Urdu based on bootstrap POS learning

CLIAWS3 '09 Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies

A hybrid approach to Arabic named entity recognition

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Named Entity Recognition or Extraction (NER) is an important task for automated text processing for industries and academia engaged in the field of language processing, intelligence gathering and Bioinformatics. In this paper we discuss the general problem of Named Entity Recognition, more specifically the challenges in NER in languages that do not have language resources e.g. large annotated corpora. We specifically address the challenges for Urdu NER and differentiate it from other South Asian (Indic) languages. We discuss the differences between Hindi and Urdu and conclude that the NER computational models for Hindi cannot be applied to Urdu. A rule-based Urdu NER algorithm is presented that outperforms the models that use statistical learning.