Difference-similitude matrix in text classification

  • Authors:
  • Xiaochun Huang;Ming Wu;Delin Xia;Puliu Yan

  • Affiliations:
  • School of Electronic Information, Wuhan University, Wuhan, Hubei, China;School of Electronic Information, Wuhan University, Wuhan, Hubei, China;School of Electronic Information, Wuhan University, Wuhan, Hubei, China;School of Electronic Information, Wuhan University, Wuhan, Hubei, China

  • Venue:
  • FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text classification can greatly improve the performance of information retrieval and information filtering, but high dimensionality of documents baffles the applications of most classification approaches. This paper proposed a Difference-Similitude Matrix (DSM) based method to solve the problem. The method represents a pre-classified collection as an item-document matrix, in which documents in same categories are described with similarities while documents in different categories with differences. Using the DSM reduction algorithm, simpler and more efficient than rough set reduction, we reduced the dimensionality of document space and generated rules for text classification.