Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Text categorization using distributional clustering and concept extraction
ICIC'07 Proceedings of the intelligent computing 3rd international conference on Advanced intelligent computing theories and applications
A RBF network for chinese text classification based on concept feature extraction
ICONIP'06 Proceedings of the 13th international conference on Neural information processing - Volume Part III
ICIC'06 Proceedings of the 2006 international conference on Intelligent Computing - Volume Part I
Automated crime report analysis and classification for e-government and decision support
Proceedings of the 14th Annual International Conference on Digital Government Research
Hi-index | 0.00 |
The feature selection and weighting are two important parts of automatic text classification. In this paper we give a new method based on concept attributes. We use the DEF Terms of the Chinese word to extract concept attributes, and a Concept Tree (C-Tree) to give these attributes proper weighs considering their positions in the C-Tree, as this information describe the expression powers of the attributes. If these attributes are too weak to sustain the main meanings of the words, they will be deserted and the original word will be reserved. Otherwise, the attributes are selected in stead of the original words. Our main research purpose is to make a balance between concept features and word ones by set a shielded level as the threshold of the feature selection after weighting these features. According to the experiment results, we conclude that we can get enough information from the combined feature set for classification and efficiently reduce the useless features and the noises. In our experiment, the feature dimension is reduced to a much smaller space and the category precise is much better than the word selection methods. By choose different shielded levels, we finally select a best one when the average category precise is up to 93.7%. From the results, we find an extra finding that the precise differences between categories are smaller when we use combined features.