Automatic Evaluation of Document Classification Using N-Gram Statistics

  • Authors:
  • Dongjin Choi;Byeongkyu Ko;Eunji Lee;Myunggwon Hwang;Pankoo Kim

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • NBIS '12 Proceedings of the 2012 15th International Conference on Network-Based Information Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Due to the development of World Wide Web technologies, people are living in the place flooding trillions of web pages in every moment. The amount of web size has been increasing dramatically. For this reason, it is getting more difficult to find relevant web documents corresponding to what users want to read. Classifying documents into predefined categories is one of the most important tasks in Natural Language Processing field. Over the years, many statistical and linguistical approaches have been applied to overcome traditional classification machine. However, it still remains in unsolved problem. There is a no perfect solution to machine understand human language yet. We have to consider every possibility for making machine think like human does. In this paper, we propose a method for classifying textural document using n-gram co-occurrence statistics which have a great possibility to find similarities between given documents. We also compare our proposed method with traditional method suggested by Keselj. This paper only covers simple approaches and still needs more sophisticated experiments. However, the performance using this method is better than the Keselj approach.