An improved method of feature selection based on concept attributes in text classification

  • Authors:
  • Shasha Liao;Minghu Jiang

  • Affiliations:
  • Lab of Computational Linguistics, Dept. of Chinese Language, Tsinghua University, Beijing, China.;Lab of Computational Linguistics, Dept. of Chinese Language, Tsinghua University, Beijing, China.

  • Venue:
  • ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part I
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The feature selection and weighting are two important parts of automatic text classification. In this paper we give a new method based on concept attributes. We use the DEF Terms of the Chinese word to extract concept attributes, and a Concept Tree (C-Tree) to give these attributes proper weighs considering their positions in the C-Tree, as this information describe the expression powers of the attributes. If these attributes are too weak to sustain the main meanings of the words, they will be deserted and the original word will be reserved. Otherwise, the attributes are selected in stead of the original words. Our main research purpose is to make a balance between concept features and word ones by set a shielded level as the threshold of the feature selection after weighting these features. According to the experiment results, we conclude that we can get enough information from the combined feature set for classification and efficiently reduce the useless features and the noises. In our experiment, the feature dimension is reduced to a much smaller space and the category precise is much better than the word selection methods. By choose different shielded levels, we finally select a best one when the average category precise is up to 93.7%. From the results, we find an extra finding that the precise differences between categories are smaller when we use combined features.