Dimensionality reduction with category information fusion and non-negative matrix factorization for text categorization

  • Authors:
  • Wenbin Zheng;Yuntao Qian;Hong Tang

  • Affiliations:
  • College of Computer Science and Technology, Zhejiang University, Hangzhou, China and College of Information Engineering, China Jiliang University, Hangzhou, China;College of Computer Science and Technology, Zhejiang University, Hangzhou, China;School of Aeronautics and Astronautics, Zhejiang University, Hangzhou, China and College of Metrological Technology & Engineering, China Jiliang University, Hangzhou, China

  • Venue:
  • AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part III
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Dimensionality reduction can efficiently improve computing performance of classifiers in text categorization, and non-negative matrix factorization could map the high dimensional term space into a low dimensional semantic subspace easily. Meanwhile, the non-negative of the basis vectors could provide a meaningful explanation for the semantic subspace. However, it usually could not achieve a satisfied classification performance because it is sensitive to the noise, data missing and outlier as a linear reconstruction method. This paper proposes a novel approach in which the train text and its category information are fused and a transformation matrix that maps the term space into a semantic subspace is obtained by a basis orthogonality non-negative matrix factorization and truncation. Finally, the dimensionality can be reduced aggressively with these transformations. Experimental results show that the proposed approach remains a good classification performance in a very low dimensional case.