Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition

Authors:
Sriparna Saha;Asif Ekbal
Affiliations:
-;-
Venue:
Data & Knowledge Engineering
Year:
2013

Citing 31
Cited 1

Instance-Based Learning Algorithms

Machine Learning
The nature of statistical learning theory

The nature of statistical learning theory
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Feature selection in SVM text categorization

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Multi-Objective Optimization Using Evolutionary Algorithms

Multi-Objective Optimization Using Evolutionary Algorithms
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A maximum entropy approach to named entity recognition

A maximum entropy approach to named entity recognition
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A hybrid approach for named entity and sub-type tagging

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A simple approach to building ensembles of Naive Bayesian classifiers for word sense disambiguation

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Named Entity recognition without gazetteers

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Rapid development of Hindi named entity recognition using conditional random fields and feature induction

ACM Transactions on Asian Language Information Processing (TALIP)
The effect of named entities on effectiveness in cross-language information retrieval evaluation

Proceedings of the 2005 ACM symposium on Applied computing
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Training a naive bayes classifier via the EM algorithm with a class distribution constraint

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Named entity recognition through classifier combination

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Memory-Based Language Processing (Studies in Natural Language Processing)

Memory-Based Language Processing (Studies in Natural Language Processing)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Organizing and searching the world wide web of facts - step one: the one-million fact extraction challenge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Improving machine translation quality with automatic named entity recognition

EAMT '03 Proceedings of the 7th International EAMT workshop on MT and other Language Technology Tools, Improving MT through other Language Technology Tools: Resources and Tools for Building MT
Chinese named entity recognition with cascaded hybrid model

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Voted NER system using appropriate unlabeled data

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Weighted Vote-Based Classifier Ensemble for Named Entity Recognition: A Genetic Algorithm-Based Approach

ACM Transactions on Asian Language Information Processing (TALIP)
Classifier Ensemble Selection Using Genetic Algorithm for Named Entity Recognition

Research on Language and Computation
A fast and elitist multiobjective genetic algorithm: NSGA-II

IEEE Transactions on Evolutionary Computation
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm

IEEE Transactions on Information Theory
A multiobjective simulated annealing approach for classifier ensemble: Named entity recognition in Indian languages as case studies

Expert Systems with Applications: An International Journal
Multiobjective optimization for classifier ensemble and feature selection: an application to named entity recognition

International Journal on Document Analysis and Recognition

Editorial: Modifications of the construction and voting mechanisms of the Random Forests Algorithm

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we pose the classifier ensemble problem under single and multiobjective optimization frameworks, and evaluate it for Named Entity Recognition (NER), an important step in almost all Natural Language Processing (NLP) application areas. We propose the solutions to two different versions of the ensemble problem for each of the optimization frameworks. We hypothesize that the reliability of predictions of each classifier differs among the various output classes. Thus, in an ensemble system it is necessary to find out either the eligible classes for which a classifier is most suitable to vote (i.e., binary vote based ensemble) or to quantify the amount of voting for each class in a particular classifier (i.e., real vote based ensemble). We use seven diverse classifiers, namely Naive Bayes, Decision Tree (DT), Memory Based Learner (MBL), Hidden Markov Model (HMM), Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) to build a number of models depending upon the various representations of the available features that are identified and selected mostly without using any domain knowledge and/or language specific resources. The proposed technique is evaluated for three resource-constrained languages, namely Bengali, Hindi and Telugu. Results using multiobjective optimization (MOO) based technique yield the overall recall, precision and F-measure values of 94.21%, 94.72% and 94.74%, respectively for Bengali, 99.07%, 90.63% and 94.66%, respectively for Hindi and 82.79%, 95.18% and 88.55%, respectively for Telugu. Results for all the languages show that the proposed MOO based classifier ensemble with real voting attains the performance level which is superior to all the individual classifiers, three baseline ensembles and the corresponding single objective based ensemble.