An improved method of feature selection based on concept attributes in text classification

Authors:
Shasha Liao;Minghu Jiang
Affiliations:
Lab of Computational Linguistics, Dept. of Chinese Language, Tsinghua University, Beijing, China.;Lab of Computational Linguistics, Dept. of Chinese Language, Tsinghua University, Beijing, China.
Venue:
ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part I
Year:
2005

Citing 4
Cited 4

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning

Text categorization using distributional clustering and concept extraction

ICIC'07 Proceedings of the intelligent computing 3rd international conference on Advanced intelligent computing theories and applications
A RBF network for chinese text classification based on concept feature extraction

ICONIP'06 Proceedings of the 13th international conference on Neural information processing - Volume Part III
Concept features extraction and text clustering analysis of neural networks based on cognitive mechanism

ICIC'06 Proceedings of the 2006 international conference on Intelligent Computing - Volume Part I
Automated crime report analysis and classification for e-government and decision support

Proceedings of the 14th Annual International Conference on Digital Government Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

The feature selection and weighting are two important parts of automatic text classification. In this paper we give a new method based on concept attributes. We use the DEF Terms of the Chinese word to extract concept attributes, and a Concept Tree (C-Tree) to give these attributes proper weighs considering their positions in the C-Tree, as this information describe the expression powers of the attributes. If these attributes are too weak to sustain the main meanings of the words, they will be deserted and the original word will be reserved. Otherwise, the attributes are selected in stead of the original words. Our main research purpose is to make a balance between concept features and word ones by set a shielded level as the threshold of the feature selection after weighting these features. According to the experiment results, we conclude that we can get enough information from the combined feature set for classification and efficiently reduce the useless features and the noises. In our experiment, the feature dimension is reduced to a much smaller space and the category precise is much better than the word selection methods. By choose different shielded levels, we finally select a best one when the average category precise is up to 93.7%. From the results, we find an extra finding that the precise differences between categories are smaller when we use combined features.