Voted NER system using appropriate unlabeled data

Authors:
Asif Ekbal;Sivaji Bandyopadhyay
Affiliations:
Jadavpur University, Kolkata, India;Jadavpur University, Kolkata, India
Venue:
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Year:
2009

Citing 6
Cited 10

The nature of statistical learning theory

The nature of statistical learning theory
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A maximum entropy approach to named entity recognition

A maximum entropy approach to named entity recognition
Rapid development of Hindi named entity recognition using conditional random fields and feature induction

ACM Transactions on Asian Language Information Processing (TALIP)
Named entity recognition through classifier combination

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4

Classifier Ensemble using Multiobjective Optimization for Named Entity Recognition

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Multiobjective optimization approach for named entity recognition

PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Weighted vote based classifier ensemble selection using genetic algorithm for named entity recognition

NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems
Weighted Vote-Based Classifier Ensemble for Named Entity Recognition: A Genetic Algorithm-Based Approach

ACM Transactions on Asian Language Information Processing (TALIP)
Classifier Ensemble Selection Using Genetic Algorithm for Named Entity Recognition

Research on Language and Computation
Integrating rule-based system with classification for arabic named entity recognition

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
A multiobjective simulated annealing approach for classifier ensemble: Named entity recognition in Indian languages as case studies

Expert Systems with Applications: An International Journal
Bootstrapping method for chunk alignment in phrase based SMT

EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition

Data & Knowledge Engineering
Full Length Article: Simulated annealing based classifier ensemble techniques: Application to part of speech tagging

Information Fusion

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper reports a voted Named Entity Recognition (NER) system with the use of appropriate unlabeled data. The proposed method is based on the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) and has been tested for Bengali. The system makes use of the language independent features in the form of different contextual and orthographic word level features along with the language dependent features extracted from the Part of Speech (POS) tagger and gazetteers. Context patterns generated from the unlabeled data using an active learning method have been used as the features in each of the classifiers. A semi-supervised method has been used to describe the measures to automatically select effective documents and sentences from unlabeled data. Finally, the models have been combined together into a final system by weighted voting technique. Experimental results show the effectiveness of the proposed approach with the overall Recall, Precision, and F-Score values of 93.81%, 92.18% and 92.98%, respectively. We have shown how the language dependent features can improve the system performance.