Full Length Article: Simulated annealing based classifier ensemble techniques: Application to part of speech tagging

  • Authors:
  • Asif Ekbal;Sriparna Saha

  • Affiliations:
  • Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, India;Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, India

  • Venue:
  • Information Fusion
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Part-of-Speech (PoS) tagging is an important pipelined module for almost all Natural Language Processing (NLP) application areas. In this paper we formulate PoS tagging within the frameworks of single and multi-objective optimization techniques. At the very first step we propose a classifier ensemble technique for PoS tagging using the concept of single objective optimization (SOO) that exploits the search capability of simulated annealing (SA). Thereafter we devise a method based on multiobjective optimization (MOO) to solve the same problem, and for this a recently developed multiobjective simulated annealing based technique, AMOSA, is used. The characteristic features of AMOSA are its concepts of the amount of domination and archive in simulated annealing, and situation specific acceptance probabilities. We use Conditional Random Field (CRF) and Support Vector Machine (SVM) as the underlying classification methods that make use of a diverse set of features, mostly based on local contexts and orthographic constructs. We evaluate our proposed approaches for two Indian languages, namely Bengali and Hindi. Evaluation results of the single objective version shows the overall accuracy of 88.92% for Bengali and 87.67% for Hindi. The MOO based ensemble yields the overall accuracies of 90.45% and 89.88% for Bengali and Hindi, respectively.