A Comparison of Performance of Sequential Learning Algorithms on the Task of Named Entity Recognition for Indian Languages

Authors:
Awaghad Ashish Krishnarao;Himanshu Gahlot;Amit Srinet;D. S. Kushwaha
Affiliations:
Motilal Nehru National Institute of Technology, Allahabad, India 211004;Motilal Nehru National Institute of Technology, Allahabad, India 211004;Motilal Nehru National Institute of Technology, Allahabad, India 211004;Motilal Nehru National Institute of Technology, Allahabad, India 211004
Venue:
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Year:
2009

Citing 12
Cited 0

Support-Vector Networks

Machine Learning
Internal and external evidence in the identification and semantic categorization of proper names

Corpus processing for lexical acquisition
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A hybrid approach for named entity and sub-type tagging

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Named entity chunking techniques in supervised learning for Japanese named entity recognition

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
The NYU system for MUC-6 or where's the syntax?

MUC6 '95 Proceedings of the 6th conference on Message understanding
Japanese named entity recognition based on a simple rule generator and decision tree learning

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Named entity recognition using an HMM-based chunk tagger

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Introduction to the CoNLL-2002 shared task: language-independent named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Use of support vector machines in extended named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Can corpus based measures be used for comparative study of languages?

SigMorPhon '07 Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology

Quantified Score

Hi-index	0.01

Visualization

Abstract

We have taken up the issue of named entity recognition of Indian languages by presenting a comparative study of two sequential learning algorithms viz. Conditional Random Fields (CRF) and Support Vector Machine (SVM). Though we only have results for Hindi, the features used are language independent, and hence the same procedure could be applied to tag the named entities in other Indian languages like Telgu, Bengali, Marathi etc. that have same number of vowels and consonants. We have used CRF++ for implementing CRF algorithm and Yamcha for implementing SVM algorithm. The results show a superiority of CRF over SVM and are just a little lower than the highest results achieved for this task. This can be attributed to the non-usage of any pre-processing and post-processing steps. The system makes use of the contextual information of words along with various language independent features to label the Named Entities (NEs).