Shape pattern matching: A tool to cluster unstructured text documents
Journal of Computational Methods in Sciences and Engineering - Special Supplement Issue in Section A and B: Selected Papers from the ISCA International Conference on Software Engineering and Data Engineering, 2009
Hi-index | 0.00 |
This paper discusses the document classification prob- lems in text mining from the viewpoint of asymptotic statis- tical analysis. By formulation of statistical hypotheses test which is specified as a problem of text mining, some inter- esting properties can be visualized. In the problem of text mining, the several heuristics are applied to practical anal- ysis because of its experimental effectiveness in many case studies. The theoretical explanation about the performance of text mining techniques is required and this approach will give us very clear idea. The distance measure in word vec- tor space is used to classify the documents. In this paper, the performance of distance measure is also analized from the new viewpoint of asymptotic analysis.