A comparative study on feature reduction approaches in Hindi and Bengali named entity recognition

  • Authors:
  • Sujan Kumar Saha;Pabitra Mitra;Sudeshna Sarkar

  • Affiliations:
  • Dept. of CSE, Birla Institute of Technology Mesra, Ranchi 835215, India;Dept. of CSE, Indian Institute of Technology Kharagpur, Kharagpur 721302, India;Dept. of CSE, Indian Institute of Technology Kharagpur, Kharagpur 721302, India

  • Venue:
  • Knowledge-Based Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.01

Visualization

Abstract

Features used for named entity recognition (NER) are often high dimensional in nature. These cause overfitting when training data is not sufficient. Dimensionality reduction leads to performance enhancement in such situations. There are a number of approaches for dimensionality reduction based on feature selection and feature extraction. In this paper we perform a comprehensive and comparative study on different dimensionality reduction approaches applied to the NER task. To compare the performance of the various approaches we consider two Indian languages namely Hindi and Bengali. NER accuracies achieved in these languages are comparatively poor as yet, primarily due to scarcity of annotated corpus. For both the languages dimensionality reduction is found to improve performance of the classifiers. A Comparative study of the effectiveness of several dimensionality reduction techniques is presented in detail in this paper.