Class confidence weighted kNN algorithms for imbalanced data sets

Authors:
Wei Liu;Sanjay Chawla
Affiliations:
School of Information Technologies, University of Sydney;School of Information Technologies, University of Sydney
Venue:
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Year:
2011

Citing 16
Cited 6

A Bayesian Method for the Induction of Probabilistic Networks from Data

Machine Learning
Data mining: practical machine learning tools and techniques with Java implementations

ACM SIGMOD Record
Centroid-Based Document Classification: Analysis and Experimental Results

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Learning Weighted Metrics to Minimize Nearest-Neighbor Classification Error

IEEE Transactions on Pattern Analysis and Machine Intelligence
The relationship between Precision-Recall and ROC curves

ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning prototypes and distances: A prototype reduction technique based on nearest neighbor error minimization

Pattern Recognition
Improving nearest neighbor rule with a simple adaptive distance measure

Pattern Recognition Letters
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Top 10 algorithms in data mining

Knowledge and Information Systems
Learning Decision Trees for Unbalanced Data

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
A method of learning weighted similarity function to improve the performance of nearest neighbor

Information Sciences: an International Journal
Distance Metric Learning for Large Margin Nearest Neighbor Classification

The Journal of Machine Learning Research
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
A Deep Non-linear Feature Mapping for Large-Margin kNN Classification

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
A novel prototype reduction method for the K-nearest neighbor algorithm with K≥1

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II

A direct boosting algorithm for the k-nearest neighbor classifier via local warping of the distance metric

Pattern Recognition Letters
CD: a coupled discretization algorithm

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
On the evolutionary optimization of k-NN by label-dependent feature weighting

Pattern Recognition Letters
Empirical study of bagging predictors on medical data

AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Avoiding the interpolation inaccuracy in nearest feature line classifier by spectral feature analysis

Pattern Recognition Letters
Class imbalance and the curse of minority hubs

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, a novel k-nearest neighbors (kNN) weighting strategy is proposed for handling the problem of class imbalance. When dealing with highly imbalanced data, a salient drawback of existing kNN algorithms is that the class with more frequent samples tends to dominate the neighborhood of a test instance in spite of distance measurements, which leads to suboptimal classification performance on the minority class. To solve this problem, we propose CCW (class confidence weights) that uses the probability of attribute values given class labels to weight prototypes in kNN. The main advantage of CCW is that it is able to correct the inherent bias to majority class in existing kNN algorithms on any distance measurement. Theoretical analysis and comprehensive experiments confirm our claims.