An Algorithm that Learns What‘s in a Name
Machine Learning - Special issue on natural language learning
A maximum entropy approach to named entity recognition
A maximum entropy approach to named entity recognition
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
ACM Transactions on Asian Language Information Processing (TALIP)
Named entity recognition using an HMM-based chunk tagger
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Improving machine translation quality with automatic named entity recognition
EAMT '03 Proceedings of the 7th International EAMT workshop on MT and other Language Technology Tools, Improving MT through other Language Technology Tools: Resources and Tools for Building MT
NE tagging for Urdu based on bootstrap POS learning
CLIAWS3 '09 Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
An Information-Extraction System for Urdu---A Resource-Poor Language
ACM Transactions on Asian Language Information Processing (TALIP)
Improving toponym extraction and disambiguation using feedback loop
ICWE'12 Proceedings of the 12th international conference on Web Engineering
When speed has a price: fast information extraction using approximate algorithms
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Named Entity Recognition (NER) has an important role in almost all Natural Language Processing (NLP) application areas including information retrieval, machine translation, question-answering system, automatic summarization etc. This paper reports about the development of a statistical Hidden Markov Model (HMM) based NER system. The system is initially developed for Bengali using a tagged Bengali news corpus, developed from the archive of a leading Bengali newspaper available in the web. The system is trained with a training corpus of 150,000 wordforms, initially tagged with a HMM based part of speech (POS) tagger. Evaluation results of the 10-fold cross validation test yield an average Recall, Precision and F-Score values of 90.2%, 79.48% and 84.5%, respectively. This HMM based NER system is then trained and tested on the Hindi data to show its effectiveness towards the language independent abilities. Experimental results of the 10-fold cross validation test has demonstrated the average Recall, Precision and F-Score values of 82.5%, 74.6% and 78.35%, respectively with 27,151 Hindi wordforms.