Class-indexing-based term weighting for automatic text classification

  • Authors:
  • Fuji Ren;Mohammad Golam Sohrab

  • Affiliations:
  • Faculty of Engineering, University of Tokushima, 2-1 Minami Josanjima, Tokushima 770-8506, Japan;Faculty of Engineering, University of Tokushima, 2-1 Minami Josanjima, Tokushima 770-8506, Japan

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2013

Quantified Score

Hi-index 0.07

Visualization

Abstract

Most of the previous studies related on different term weighting emphasize on the document-indexing-based and four fundamental information elements-based approaches to address automatic text classification (ATC). In this study, we introduce class-indexing-based term-weighting approaches and judge their effects in high-dimensional and comparatively low-dimensional vector space over the TF.IDF and five other different term weighting approaches that are considered as the baseline approaches. First, we implement a class-indexing-based TF.IDF.ICF observational term weighting approach in which the inverse class frequency (ICF) is incorporated. In the experiment, we investigate the effects of TF.IDF.ICF over the Reuters-21578, 20 Newsgroups, and RCV1-v2 datasets as benchmark collections, which provide positive discrimination on rare terms in the vector space and biased against frequent terms in the text classification (TC) task. Therefore, we revised the ICF function and implemented a new inverse class space density frequency (ICS"@dF), and generated the TF.IDF.ICS"@dF method that provides a positive discrimination on infrequent and frequent terms. We present detailed evaluation of each category for the three datasets with term weighting approaches. The experimental results show that the proposed class-indexing-based TF.IDF.ICS"@dF term weighting approach is promising over the compared well-known baseline term weighting approaches.