A new differential LSI space-based probabilistic document classifier

  • Authors:
  • Liang Chen;Naoyuki Tokuda;Akira Nagai

  • Affiliations:
  • Computer Science Department, University of Northern British Columbia, Prince George, BC, Canada, V2N 4Z9;Sunflare Company, Shinjuku-Hirose Bldg., 7 Yotsuya 4-chome, Sinjuku-ku, Tokyo, Japan 160-0004;Advanced Media Network Center, Utsunomiya University, Utsunomiya, Tochigi, Japan 321-8585

  • Venue:
  • Information Processing Letters
  • Year:
  • 2003

Quantified Score

Hi-index 0.89

Visualization

Abstract

We have developed a new effective probabilistic classifier for document classification by introducing the concept of differential document vectors and DLSI (differential latent semantic indexing) spaces. A combined use of the projections on and the distances to the DLSI spaces introduced from the differential document vectors improves the adaptability of the LSI (latent semantic indexing) method by capturing unique characteristics of documents. Using the intra- and extra-document statistics, both a simple posteriori calculation on a small example and an experiment on a large Reuters-21578 database demonstrate the advantage of the DLSI space-based probabilistic classifier over the LSI space-based classifier in classification performance.