Automatic feature selection for named entity recognition using genetic algorithm

  • Authors:
  • Huong Thanh Le;Luan Van Tran

  • Affiliations:
  • Hanoi University of Science and Technology, Hanoi, Vietnam;Hanoi University of Science and Technology, Hanoi, Vietnam

  • Venue:
  • Proceedings of the Fourth Symposium on Information and Communication Technology
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a feature selection approach for named entity recognition using genetic algorithm. Different aspects of genetic algorithm including computational time and criteria for evaluating an individual (i.e., size of the feature subset and the classifier's accuracy) are analyzed in order to optimize its learning process. Two machine learning algorithms, k-Nearest Neighbor and Conditional Random Fields, are used to calculate the accuracy of the named entity recognition system. To evaluate the effectiveness of our genetic algorithm, feature subsets returning by our proposed genetic algorithm are compared to feature subsets returning by a hill climbing algorithm and a backward one. Experimental results show that feature subsets obtained by our genetic algorithm is much smaller than the original feature set without losing of predictive accuracy. Furthermore, these feature subsets result in higher classifier's accuracies than that of the hill climbing algorithm and the backward one.