A hidden Markov model based named entity recognition system: Bengali and Hindi as case studies

Authors:
Asif Ekbal;Sivaji Bandyopadhyay
Affiliations:
Computer Science and Engineering Department, Jadavpur University, Kolkata, India;Computer Science and Engineering Department, Jadavpur University, Kolkata, India
Venue:
PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence
Year:
2007

Citing 7
Cited 4

An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
A maximum entropy approach to named entity recognition

A maximum entropy approach to named entity recognition
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Rapid development of Hindi named entity recognition using conditional random fields and feature induction

ACM Transactions on Asian Language Information Processing (TALIP)
Named entity recognition using an HMM-based chunk tagger

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Improving machine translation quality with automatic named entity recognition

EAMT '03 Proceedings of the 7th International EAMT workshop on MT and other Language Technology Tools, Improving MT through other Language Technology Tools: Resources and Tools for Building MT

NE tagging for Urdu based on bootstrap POS learning

CLIAWS3 '09 Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
An Information-Extraction System for Urdu---A Resource-Poor Language

ACM Transactions on Asian Language Information Processing (TALIP)
Improving toponym extraction and disambiguation using feedback loop

ICWE'12 Proceedings of the 12th international conference on Web Engineering
When speed has a price: fast information extraction using approximate algorithms

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Named Entity Recognition (NER) has an important role in almost all Natural Language Processing (NLP) application areas including information retrieval, machine translation, question-answering system, automatic summarization etc. This paper reports about the development of a statistical Hidden Markov Model (HMM) based NER system. The system is initially developed for Bengali using a tagged Bengali news corpus, developed from the archive of a leading Bengali newspaper available in the web. The system is trained with a training corpus of 150,000 wordforms, initially tagged with a HMM based part of speech (POS) tagger. Evaluation results of the 10-fold cross validation test yield an average Recall, Precision and F-Score values of 90.2%, 79.48% and 84.5%, respectively. This HMM based NER system is then trained and tested on the Hindi data to show its effectiveness towards the language independent abilities. Experimental results of the 10-fold cross validation test has demonstrated the average Recall, Precision and F-Score values of 82.5%, 74.6% and 78.35%, respectively with 27,151 Hindi wordforms.