Instance selection in text classification using the silhouette coefficient measure

Authors:
Debangana Dey;Thamar Solorio;Manuel Montes y Gómez;Hugo Jair Escalante
Affiliations:
Department of Computer and Information Sciences, University of Alabama at Birmingham, Birmingham, AL;Department of Computer and Information Sciences, University of Alabama at Birmingham, Birmingham, AL;Department of Computer and Information Sciences, University of Alabama at Birmingham, Birmingham, AL;National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico
Venue:
MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Year:
2011

Citing 13
Cited 0

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
A practical approach to feature selection

ML92 Proceedings of the ninth international workshop on Machine learning
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Advances in Instance Selection for Instance-Based Learning Algorithms

Data Mining and Knowledge Discovery
A divisive information theoretic feature clustering algorithm for text classification

The Journal of Machine Learning Research
Rapid and brief communication: A clustering method for automatic biometric template selection

Pattern Recognition
Prototype Selection Via Prototype Relevance

CIARP '08 Proceedings of the 13th Iberoamerican congress on Pattern Recognition: Progress in Pattern Recognition, Image Analysis and Applications
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
A Self-enriching Methodology for Clustering Narrow Domain Short Texts

The Computer Journal
Clustering abstracts of scientific texts using the transition point technique

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Enhanced centroid-based classification technique by filtering outliers

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Cluster-based instance selection for machine classification

Knowledge and Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper proposes the use of the Silhouette Coefficient (SC) as a ranking measure to perform instance selection in text classification. Our selection criterion was to keep instances with mid-range SC values while removing the instances with high and low SC values. We evaluated our hypothesis across three well-known datasets and various machine learning algorithms. The results show that our method helps to achieve the best trade-off between classification accuracy and training time.