Pruning Training Corpus to Speedup Text Classification

  • Authors:
  • Jihong Guan;Shuigeng Zhou

  • Affiliations:
  • -;-

  • Venue:
  • DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the rapid growth of online text information, efficient text classification has become one of the key techniques for organizing and processing text repositories. In this paper, an efficient text classification approach was proposed based on pruning training-corpus. By using the proposed approach, noisy and superfluous documents in training corpuses can be cut off drastically, which leads to substantial classification efficiency improvement. Effective algorithm for training corpus pruning is proposed. Experiments over the commonly used Reuters benchmark are carried out, which validates the effectiveness and efficiency of the proposed approach.