Using the feature projection technique based on a normalized voting method for text classification

  • Authors:
  • Youngjoong Ko;Jungyun Seo

  • Affiliations:
  • NLP Lab, Department of Computer Science, Sogang University, Sinsu-dong 1, Mapo-gu, Seoul 121-742, South Korea;NLP Lab, Department of Computer Science, Sogang University, Sinsu-dong 1, Mapo-gu, Seoul 121-742, South Korea

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a new approach for text categorization, based on a feature projection technique. In our approach, training data are represented as the projections of training documents on each feature. The voting for a classification is processed on the basis of individual feature projections. The final classification of test documents is determined by a majority voting from the individual classifications of each feature. Our empirical results show that the proposed approach, text categorization using feature projections (TCFP), outperforms k-NN, Rocchio, and Naive Bayes. Most of all, TCFP is a faster classifier, up to one hundred times faster than k-NN in the Newsgroups data set. It is also robust from noisy data. Since the TCFP algorithm is very simple, its implementation and training process can be done very easily. For these reasons, TCFP can be a useful classifier in text categorization tasks, which need fast execution speed, robustness, and high performance.