CLDA: Feature Selection for Text Categorization Based on Constrained LDA

  • Authors:
  • Cui Zifeng;Xu Baowen;Zhang Weifeng;Jiang Dawei;Xu Junling

  • Affiliations:
  • Southeast University, China;Southeast University, China;Nanjing University of Posts and Telecommunications, China;Southeast University, China;Southeast University, China

  • Venue:
  • ICSC '07 Proceedings of the International Conference on Semantic Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Feature selection is a necessary process before pattern classification, machine learning and data mining. Now feature selection is facing challenge in high dimension space, such as text categorization in information retrieval. Linear Discriminant Analysis (LDA) is an excellent dimensionality reduction method which transforms the original data into low-dimensional feature space. However, it changes the original physical features and makes features uninterpretable, which motivates us to select but not transform features by LDA idea of preserving structure information of between-class and within-class for text categorization. In the paper, a new approach of feature selection based on Constrained LDA (CLDA) is proposed, which models feature selection as a search problem in subspace and finds optimal solution subject to some restrictions. Further, CLDA optimization problem is transformed into a process of scoring and sorting of features. Experiments on 20 Newsgroups and Reuters-21578 show that CLDA is consistently better than information gain and chi2-test with lower computational complexity.