Improving the Performance of a NER System by Post-processing and Voting

Authors:
Asif Ekbal;Sivaji Bandyopadhyay
Affiliations:
Department of Computer Science and Engineering, Jadavpur University, Kolkata, India 700032;Department of Computer Science and Engineering, Jadavpur University, Kolkata, India 700032
Venue:
SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Year:
2008

Citing 8
Cited 0

An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A maximum entropy approach to named entity recognition

A maximum entropy approach to named entity recognition
Rapid development of Hindi named entity recognition using conditional random fields and feature induction

ACM Transactions on Asian Language Information Processing (TALIP)
A bootstrapping approach to named entity classification using successive learners

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Named entity recognition through classifier combination

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Meta-learning orthographic and contextual models for language independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
A stacked, voted, stacked model for named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper reports about the development of a NER system in Bengali by combining outputs of the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM). The training set consists of approximately 250K wordforms and has been manually annotated with the four major named entity (NE) tags such as Person , Location , Organization and Miscellaneous tags. The classifiers make use of the different contextual information of the words along with the variety of features that are helpful in predicting the various NE classes. Lexical context patterns, which are generated from an unlabeled corpus of 1 million wordforms in a semi-automatic way, have been used as the features of the classifiers in order to improve their performance. In addition, we have used the second best tags of the classifiers and applied several heuristics to improve the performance. Finally, the classifiers are combined using a majority voting approach. Experimental results show the effectiveness of the proposed approach with the overall average recall, precision, and f-score values of 90.78%, 87.35%, and 89.03%, respectively, which shows an improvement of 11.8% in f-score over the best performing SVM based baseline system and an improvement of 15.11% in f-score over the least performing ME based baseline system. The proposed system also outperforms the other existing Bengali NER system.