Centroid-Based Document Classification: Analysis and Experimental Results
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Learning Weighted Metrics to Minimize Nearest-Neighbor Classification Error
IEEE Transactions on Pattern Analysis and Machine Intelligence
The relationship between Precision-Recall and ROC curves
ICML '06 Proceedings of the 23rd international conference on Machine learning
Improving nearest neighbor rule with a simple adaptive distance measure
Pattern Recognition Letters
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
Top 10 algorithms in data mining
Knowledge and Information Systems
Learning Decision Trees for Unbalanced Data
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
A method of learning weighted similarity function to improve the performance of nearest neighbor
Information Sciences: an International Journal
Distance Metric Learning for Large Margin Nearest Neighbor Classification
The Journal of Machine Learning Research
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
A Deep Non-linear Feature Mapping for Large-Margin kNN Classification
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
A novel prototype reduction method for the K-nearest neighbor algorithm with K≥1
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Pattern Recognition Letters
CD: a coupled discretization algorithm
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
On the evolutionary optimization of k-NN by label-dependent feature weighting
Pattern Recognition Letters
Empirical study of bagging predictors on medical data
AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Pattern Recognition Letters
Class imbalance and the curse of minority hubs
Knowledge-Based Systems
Hi-index | 0.00 |
In this paper, a novel k-nearest neighbors (kNN) weighting strategy is proposed for handling the problem of class imbalance. When dealing with highly imbalanced data, a salient drawback of existing kNN algorithms is that the class with more frequent samples tends to dominate the neighborhood of a test instance in spite of distance measurements, which leads to suboptimal classification performance on the minority class. To solve this problem, we propose CCW (class confidence weights) that uses the probability of attribute values given class labels to weight prototypes in kNN. The main advantage of CCW is that it is able to correct the inherent bias to majority class in existing kNN algorithms on any distance measurement. Theoretical analysis and comprehensive experiments confirm our claims.