Text Categorization Based on Regularized Linear Classification Methods

  • Authors:
  • Tong Zhang;Frank J. Oles

  • Affiliations:
  • Mathematical Sciences Department, IBM T.J. Watson Research Center, Yorktown Heights, NY 10598. tzhang@watson.ibm.com;Mathematical Sciences Department, IBM T.J. Watson Research Center, Yorktown Heights, NY 10598. oles@watson.ibm.com

  • Venue:
  • Information Retrieval
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

A number of linear classification methods such as the linear least squares fit (LLSF), logistic regression, and support vector machines (SVM's) have been applied to text categorization problems. These methods share the similarity by finding hyperplanes that approximately separate a class of document vectors from its complement. However, support vector machines are so far considered special in that they have been demonstrated to achieve the state of the art performance. It is therefore worthwhile to understand whether such good performance is unique to the SVM design, or if it can also be achieved by other linear classification methods. In this paper, we compare a number of known linear classification methods as well as some variants in the framework of regularized linear systems. We will discuss the statistical and numerical properties of these algorithms, with a focus on text categorization. We will also provide some numerical experiments to illustrate these algorithms on a number of datasets.