Improving text classification with concept index terms and expansion terms

Authors:
XiangHua Fu;LianDong Liu;TianXue Gong;Lan Tao
Affiliations:
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen Guangdong, China;College of Computer Science and Software Engineering, Shenzhen University, Shenzhen Guangdong, China;College of Computer Science and Software Engineering, Shenzhen University, Shenzhen Guangdong, China;College of Computer Science and Software Engineering, Shenzhen University, Shenzhen Guangdong, China
Venue:
ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part III
Year:
2011

Citing 10
Cited 0

An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
High-performing feature selection for text classification

Proceedings of the eleventh international conference on Information and knowledge management
A Study of Approaches to Hypertext Categorization

Journal of Intelligent Information Systems
Minimum Redundancy Feature Selection from Microarray Gene Expression Data

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Automatic text categorization by unsupervised learning

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Feature selection with conditional mutual information maximin in text categorization

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
Using bag-of-concepts to improve the performance of support vector machines in text categorization

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Improving text classification by a sense spectrum approach to term expansion

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Boosting feature selection using information metric for classification

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature selection methods are widely employed to improve classification accuracy by removing redundant and noisy features. However, removing terms from documents may damage the integrity of content. To bridge the gap between the integrity of documents and the performance of classification, we propose a novel method for classification by two steps. Firstly, we select index terms and expansion terms through Maximum-Relevance and Minimum-Redundancy Analysis (MR2A). Then we combine the predictive power of index terms and expansion terms via Concept Similarity Mapping (CSM). Testing experiments on 20Newsgroups, and SOGOU datasets are carried out under different classifiers. The experiment results show that both CSM and MR2A outperform the baseline methods: Information Gain and Chi-square.