Information-based objective functions for active data selection
Neural Computation
A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
ACM Computing Surveys (CSUR)
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Pattern Recognition with Fuzzy Objective Function Algorithms
Pattern Recognition with Fuzzy Objective Function Algorithms
Active Learning for Natural Language Parsing and Information Extraction
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Less is More: Active Learning with Support Vector Machines
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Query Learning with Large Margin Classifiers
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Active learning with committees: an approach to efficient learning in text categorization using linear threshold algorithms
Support vector machine active learning with applications to text classification
The Journal of Machine Learning Research
Active learning with multiple views
Active learning with multiple views
Active learning using pre-clustering
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Active learning for statistical natural language parsing
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Semi-supervised graph clustering: a kernel approach
ICML '05 Proceedings of the 22nd international conference on Machine learning
On minimizing training corpus for parser acquisition
ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Semi-supervised clustering: probabilistic models, algorithms and experiments
Semi-supervised clustering: probabilistic models, algorithms and experiments
Large-scale text categorization by batch mode active learning
Proceedings of the 15th international conference on World Wide Web
Batch mode active learning and its application to medical image classification
ICML '06 Proceedings of the 23rd international conference on Machine learning
Confidence-Based Active Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence
Representative sampling for text classification using support vector machines
ECIR'03 Proceedings of the 25th European conference on IR research
Semi-Supervised Learning
Statistical active learning in multilayer perceptrons
IEEE Transactions on Neural Networks
Training pool selection for semi-supervised learning
ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
An Improved Model of Trust-aware Recommender Systems Using Distrust Metric
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Hi-index | 0.00 |
The crucial issue in many classification applications is how to achieve the best possible classifier with a limited number of labeled data for training. Training data selection is one method which addresses this issue by selecting the most informative data for training. In this work, we propose three data selection mechanisms based on fuzzy clustering method: center-based selection, border-based selection and hybrid selection. Center-based selection selects the samples with high degree of membership in each cluster as training data. Border-based selection selects the samples around the border between clusters. Hybrid selection is the combination of center-based selection and border-based selection. Compared with existing work, our methods do not require much computational effort. Moreover, they are independent with respect to the supervised learning algorithms and initial labeled data. We use fuzzy c-means to implement our data selection mechanisms. The effects of them are empirically studied on a set of UCI data sets. Experimental results indicate that, compared with random selection, hybrid selection can effectively enhance the learning performance in all the data sets, center-based selection shows better performance in certain data sets, border-based selection does not show significant improvement.