Statistical Evaluation of Measure and Distance on Document Classification Problems in Text Mining

  • Authors:
  • Masayuki Goto;Takashi Ishida;Shigeichi Hirasawa

  • Affiliations:
  • Musashi Institute of Technology;Waseda University;Waseda University

  • Venue:
  • CIT '07 Proceedings of the 7th IEEE International Conference on Computer and Information Technology
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper discusses the document classification prob- lems in text mining from the viewpoint of asymptotic statis- tical analysis. By formulation of statistical hypotheses test which is specified as a problem of text mining, some inter- esting properties can be visualized. In the problem of text mining, the several heuristics are applied to practical anal- ysis because of its experimental effectiveness in many case studies. The theoretical explanation about the performance of text mining techniques is required and this approach will give us very clear idea. The distance measure in word vec- tor space is used to classify the documents. In this paper, the performance of distance measure is also analized from the new viewpoint of asymptotic analysis.