A fast algorithm for feature selection in conditional maximum entropy modeling

  • Authors:
  • Yaqian Zhou;Lide Wu;Fuliang Weng;Hauke Schmidt

  • Affiliations:
  • Fudan University, Shanghai, P.R. China;Fudan University, Shanghai, P.R. China;Robert Bosch Corp., Palo Alto, CA;Robert Bosch Corp., Palo Alto, CA

  • Venue:
  • EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a fast algorithm that selects features for conditional maximum entropy modeling. Berger et al. (1996) presents an incremental feature selection (IFS) algorithm, which computes the approximate gains for all candidate features at each selection stage, and is very time-consuming for any problems with large feature spaces. In this new algorithm, instead, we only compute the approximate gains for the top-ranked features based on the models obtained from previous stages. Experiments on WSJ data in Penn Treebank are conducted to show that the new algorithm greatly speeds up the feature selection process while maintaining the same quality of selected features. One variant of this new algorithm with look-ahead functionality is also tested to further confirm the good quality of the selected features. The new algorithm is easy to implement, and given a feature space of size F, it only uses O(F) more space than the original IFS algorithm.