Using Clustering and Co-5raining to Boost Classification Performance

  • Authors:
  • Antonia Kyriakopoulou

  • Affiliations:
  • -

  • Venue:
  • ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper shows that the performance of a linear SVM classifier can be improved by utilizing meta-information derived from clustering. Clustering aims in discovering extra knowledge concerning the structure of the whole dataset, (both training and testing set). A co-training algo- rithm is introduced that uses clustering as a complementary step to text classification. At each iteration step of the algo- rithm the clustering phase augments the feature space with a new meta-feature that for each document reflects cluster membership and the classification phase introduces another meta-feature that indicates class membership. Experimen- tal results obtained using widely used datasets demonstrate the effectiveness of the proposed approaches especially for small training sets.