Difference-similitude matrix in text classification

Authors:
Xiaochun Huang;Ming Wu;Delin Xia;Puliu Yan
Affiliations:
School of Electronic Information, Wuhan University, Wuhan, Hubei, China;School of Electronic Information, Wuhan University, Wuhan, Hubei, China;School of Electronic Information, Wuhan University, Wuhan, Hubei, China;School of Electronic Information, Wuhan University, Wuhan, Hubei, China
Venue:
FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
Year:
2005

Citing 8
Cited 0

Rough classification

International Journal of Man-Machine Studies
Text classification using ESC-based stochastic decision lists

Proceedings of the eighth international conference on Information and knowledge management
The feature quantity: an information theoretic perspective of Tfidf-like measures

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing

Communications of the ACM
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
Scalable Classification Method Based on Rough Sets

TSCTC '02 Proceedings of the Third International Conference on Rough Sets and Current Trends in Computing
Support vector learning for fuzzy rule-based classification systems

IEEE Transactions on Fuzzy Systems
Neural-network feature selector

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text classification can greatly improve the performance of information retrieval and information filtering, but high dimensionality of documents baffles the applications of most classification approaches. This paper proposed a Difference-Similitude Matrix (DSM) based method to solve the problem. The method represents a pre-classified collection as an item-document matrix, in which documents in same categories are described with similarities while documents in different categories with differences. Using the DSM reduction algorithm, simpler and more efficient than rough set reduction, we reduced the dimensionality of document space and generated rules for text classification.