Discovering informative patterns and data cleaning
Advances in knowledge discovery and data mining
Selective Sampling Using the Query by Committee Algorithm
Machine Learning
The Random Subspace Method for Constructing Decision Forests
IEEE Transactions on Pattern Analysis and Machine Intelligence
Experiments with Noise Filtering in a Medical Domain
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
Combining Multiple Clusterings Using Evidence Accumulation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Pruning Training Sets for Learning of Object Categories
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Consensus unsupervised feature ranking from multiple views
Pattern Recognition Letters
Error detection and impact-sensitive instance ranking in noisy datasets
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Learning assignment order of instances for the constrained K-means clustering algorithm
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Identifying and eliminating mislabeled training instances
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
PCA-based high-dimensional noisy data clustering via control of decision errors
Knowledge-Based Systems
A competitive ensemble pruning approach based on cross-validation technique
Knowledge-Based Systems
Combining co-clustering with noise detection for theme-based summarization
ACM Transactions on Speech and Language Processing (TSLP)
Hi-index | 0.00 |
Data pruning works with identifying noisy instances of a data set and removing them from the data set in order to improve the generalization of a learning algorithm. It has been well studied in supervised classification where the identification and removal of noisy instances are guided by available labels of instances. However, to the best knowledge of the authors', very few work has been done on data pruning for unsupervised clustering. This paper deals with the problem of data pruning for unsupervised clustering under the condition that labels of instances are unknown beforehand. We claim that unsupervised data pruning can benefit for the clustering of the data with noise. We propose a feasible approach, termed as unsupervised Data Pruning using Ensembles of multiple Clusterers (DPEC), to identify noisy instances of a data set. DPEC checks all instances of a data set and identifies noisy instances by using ensembles of multiple clustering results provided by different clusterers on the same data set. We test the performance of DPEC on several real data sets with artificial noise. Experimental results demonstrate that DPEC is often able to improve the accuracy and robustness of the clustering algorithm.