Text categorization based on fuzzy soft set theory

  • Authors:
  • Bana Handaga;Mustafa Mat Deris

  • Affiliations:
  • Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Johor, Malaysia;Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Johor, Malaysia

  • Venue:
  • ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part IV
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we proposed a new method for Text Categorization based on fuzzy soft set theory so called fuzzy soft set classifier (FSSC). We use fuzzy soft set representation that derived from the bag-of-words representation and define each term as a distinct word in the set of words of the document collection. The FSSC categorize each document by using fuzzy c-means formula for classification, and use fuzzy soft set similarity to measure distance between two documents. We perform the experiments with the standard Reuters-21578 dataset, and using three kind of weigthing such as boolean, term frequency, and term frequency-invert document frequency to compare the performance of FSSC with others four classifier such as kNN, Bayesian, Rocchio, and SVM. We are using precision, recall, F-measure, retun-size, and the running time as a performance evaluation. Result shown that there is no absolute winner. The FSSC has precision, recall, and F-measure lower than SVM, and kNN but FSSC can work faster than both. When compared with the Bayesian and Rocchio, the FSSC works more slowly but has a higher precision and F-measure.