Automatic feature selection for named entity recognition using genetic algorithm

Authors:
Huong Thanh Le;Luan Van Tran
Affiliations:
Hanoi University of Science and Technology, Hanoi, Vietnam;Hanoi University of Science and Technology, Hanoi, Vietnam
Venue:
Proceedings of the Fourth Symposium on Information and Communication Technology
Year:
2013

Citing 7
Cited 0

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Feature Selection for Clustering - A Filter Solution

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Feature Selection for Unsupervised Learning

The Journal of Machine Learning Research
A comparison of algorithms for maximum entropy parameter estimation

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Evolutionary model selection in unsupervised learning

Intelligent Data Analysis
Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Speeding Up Greedy Forward Selection for Regularized Least-Squares

ICMLA '10 Proceedings of the 2010 Ninth International Conference on Machine Learning and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a feature selection approach for named entity recognition using genetic algorithm. Different aspects of genetic algorithm including computational time and criteria for evaluating an individual (i.e., size of the feature subset and the classifier's accuracy) are analyzed in order to optimize its learning process. Two machine learning algorithms, k-Nearest Neighbor and Conditional Random Fields, are used to calculate the accuracy of the named entity recognition system. To evaluate the effectiveness of our genetic algorithm, feature subsets returning by our proposed genetic algorithm are compared to feature subsets returning by a hill climbing algorithm and a backward one. Experimental results show that feature subsets obtained by our genetic algorithm is much smaller than the original feature set without losing of predictive accuracy. Furthermore, these feature subsets result in higher classifier's accuracies than that of the hill climbing algorithm and the backward one.