Improving automatic Chinese text categorization by error correction

  • Authors:
  • Jhy-Jong Tsay;Jing-Doo Wang

  • Affiliations:
  • Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi, Taiwan 62107, ROC;Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi, Taiwan 62107, ROC

  • Venue:
  • IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we use the miss-classified news in training data as a feedback to improve the classification accuracy. We isolate the miss-classified news from the news of original classes to form new subclasses, and modify Rocchio linear classifier by using the subclasses to form new prototype vectors such that there are more than one centroid to represent each class. We have two methods, error threshold and entropy threshold, to evaluate whether the isolation of miss-classified news is worthy or not. Experimental result shows that our approaches improves Rocchio's micro-level accuracy and achieves similar performance as kNN, but with less classification time. On the other hand, with the entropy of miss- classified news, we can figure out the ambiguity between classes and sketch the diagram of the relationship between classes as a suggestion to reorganize the structure of classes for news in the future.