Improving supervised learning performance by using fuzzy clustering method to select training data

Authors:
Donghai Guan;Weiwei Yuan;Young-Koo Lee;Andrey Gavrilov;Sungyoung Lee
Affiliations:
Department of Computer Engineering, Kyung Hee University, Korea;Department of Computer Engineering, Kyung Hee University, Korea;(Correspd. Tel.: +82 31 201 3732/ E-mail: yklee@khu.ac.kr) Department of Computer Engineering, Kyung Hee University, Korea;Department of Computer Engineering, Kyung Hee University, Korea;Department of Computer Engineering, Kyung Hee University, Korea
Venue:
Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - Fuzzy theory and technology with applications
Year:
2008

Citing 23
Cited 2

Information-based objective functions for active data selection

Neural Computation
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
Data clustering: a review

ACM Computing Surveys (CSUR)
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
Active Learning for Natural Language Parsing and Information Extraction

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Less is More: Active Learning with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Query Learning with Large Margin Classifiers

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Active learning with committees: an approach to efficient learning in text categorization using linear threshold algorithms

Active learning with committees: an approach to efficient learning in text categorization using linear threshold algorithms
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Active learning with multiple views

Active learning with multiple views
Active learning using pre-clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Active learning for statistical natural language parsing

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Semi-supervised graph clustering: a kernel approach

ICML '05 Proceedings of the 22nd international conference on Machine learning
On minimizing training corpus for parser acquisition

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Semi-supervised clustering: probabilistic models, algorithms and experiments

Semi-supervised clustering: probabilistic models, algorithms and experiments
Large-scale text categorization by batch mode active learning

Proceedings of the 15th international conference on World Wide Web
Batch mode active learning and its application to medical image classification

ICML '06 Proceedings of the 23rd international conference on Machine learning
Confidence-Based Active Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence
Representative sampling for text classification using support vector machines

ECIR'03 Proceedings of the 25th European conference on IR research
Semi-Supervised Learning

Semi-Supervised Learning
Statistical active learning in multilayer perceptrons

IEEE Transactions on Neural Networks

Training pool selection for semi-supervised learning

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
An Improved Model of Trust-aware Recommender Systems Using Distrust Metric

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The crucial issue in many classification applications is how to achieve the best possible classifier with a limited number of labeled data for training. Training data selection is one method which addresses this issue by selecting the most informative data for training. In this work, we propose three data selection mechanisms based on fuzzy clustering method: center-based selection, border-based selection and hybrid selection. Center-based selection selects the samples with high degree of membership in each cluster as training data. Border-based selection selects the samples around the border between clusters. Hybrid selection is the combination of center-based selection and border-based selection. Compared with existing work, our methods do not require much computational effort. Moreover, they are independent with respect to the supervised learning algorithms and initial labeled data. We use fuzzy c-means to implement our data selection mechanisms. The effects of them are empirically studied on a set of UCI data sets. Experimental results indicate that, compared with random selection, hybrid selection can effectively enhance the learning performance in all the data sets, center-based selection shows better performance in certain data sets, border-based selection does not show significant improvement.