Manipulating large corpora for text classification

  • Authors:
  • Fumiyo Fukumoto;Yoshimi Suzuki

  • Affiliations:
  • Yamanashi University, Kofu, Japan;Yamanashi University, Kofu, Japan

  • Venue:
  • EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we address the problem of dealing with a large collection of data and propose a method for text classification which manipulates data using two well-known machine learning techniques, Naive Bayes(NB) and Support Vector Machines(SVMs). NB is based on the assumption of word independence in a text, which makes the computation of it far more efficient. SVMs, on the other hand, have the potential to handle large feature spaces, which makes it possible to produce better performance. The training data for SVMs are extracted using NB classifiers according to the category hierarchies, which makes it possible to reduce the amount of computation necessary for classification without sacrificing accuracy.