Instance selection in text classification using the silhouette coefficient measure

  • Authors:
  • Debangana Dey;Thamar Solorio;Manuel Montes y Gómez;Hugo Jair Escalante

  • Affiliations:
  • Department of Computer and Information Sciences, University of Alabama at Birmingham, Birmingham, AL;Department of Computer and Information Sciences, University of Alabama at Birmingham, Birmingham, AL;Department of Computer and Information Sciences, University of Alabama at Birmingham, Birmingham, AL;National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico

  • Venue:
  • MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper proposes the use of the Silhouette Coefficient (SC) as a ranking measure to perform instance selection in text classification. Our selection criterion was to keep instances with mid-range SC values while removing the instances with high and low SC values. We evaluated our hypothesis across three well-known datasets and various machine learning algorithms. The results show that our method helps to achieve the best trade-off between classification accuracy and training time.