Statistical Evaluation of Measure and Distance on Document Classification Problems in Text Mining

Authors:
Masayuki Goto;Takashi Ishida;Shigeichi Hirasawa
Affiliations:
Musashi Institute of Technology;Waseda University;Waseda University
Venue:
CIT '07 Proceedings of the 7th IEEE International Conference on Computer and Information Technology
Year:
2007

Citing 0
Cited 1

Shape pattern matching: A tool to cluster unstructured text documents

Journal of Computational Methods in Sciences and Engineering - Special Supplement Issue in Section A and B: Selected Papers from the ISCA International Conference on Software Engineering and Data Engineering, 2009

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper discusses the document classification prob- lems in text mining from the viewpoint of asymptotic statis- tical analysis. By formulation of statistical hypotheses test which is specified as a problem of text mining, some inter- esting properties can be visualized. In the problem of text mining, the several heuristics are applied to practical anal- ysis because of its experimental effectiveness in many case studies. The theoretical explanation about the performance of text mining techniques is required and this approach will give us very clear idea. The distance measure in word vec- tor space is used to classify the documents. In this paper, the performance of distance measure is also analized from the new viewpoint of asymptotic analysis.