Improving text classification with concept index terms and expansion terms

  • Authors:
  • XiangHua Fu;LianDong Liu;TianXue Gong;Lan Tao

  • Affiliations:
  • College of Computer Science and Software Engineering, Shenzhen University, Shenzhen Guangdong, China;College of Computer Science and Software Engineering, Shenzhen University, Shenzhen Guangdong, China;College of Computer Science and Software Engineering, Shenzhen University, Shenzhen Guangdong, China;College of Computer Science and Software Engineering, Shenzhen University, Shenzhen Guangdong, China

  • Venue:
  • ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part III
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Feature selection methods are widely employed to improve classification accuracy by removing redundant and noisy features. However, removing terms from documents may damage the integrity of content. To bridge the gap between the integrity of documents and the performance of classification, we propose a novel method for classification by two steps. Firstly, we select index terms and expansion terms through Maximum-Relevance and Minimum-Redundancy Analysis (MR2A). Then we combine the predictive power of index terms and expansion terms via Concept Similarity Mapping (CSM). Testing experiments on 20Newsgroups, and SOGOU datasets are carried out under different classifiers. The experiment results show that both CSM and MR2A outperform the baseline methods: Information Gain and Chi-square.