Document classification using POS distribution

  • Authors:
  • Masato Shirai;Takao Miura

  • Affiliations:
  • Dept.of Elect. & Elect. Engr., HOSEI University, Koganei, Tokyo, Japan;Dept.of Elect. & Elect. Engr., HOSEI University, Koganei, Tokyo, Japan

  • Venue:
  • ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this investigation, we discuss how to classify very quickly documents in Japanese putting stress on Part Of Speech (POS) distribution, not word distribution. There exist two main contributon of this investigation: linear regression approach models POS behavior in Japanese documents very well for classification, and a new excellent and efficient classification proposed based on Gaussian probability distribution, called Gaussian classifier.