C4.5: programs for machine learning
C4.5: programs for machine learning
Discriminant Adaptive Nearest Neighbor Classification
IEEE Transactions on Pattern Analysis and Machine Intelligence
Machine Learning
Machine Learning
An Instance-Weighting Method to Induce Cost-Sensitive Trees
IEEE Transactions on Knowledge and Data Engineering
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Adaptive Quasiconformal Kernel Nearest Neighbor Classification
IEEE Transactions on Pattern Analysis and Machine Intelligence
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution
IEEE Transactions on Knowledge and Data Engineering
Does cost-sensitive learning beat sampling for classifying rare classes?
UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Exploratory Under-Sampling for Class-Imbalance Learning
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Improving nearest neighbor rule with a simple adaptive distance measure
Pattern Recognition Letters
The Concentration of Fractional Distances
IEEE Transactions on Knowledge and Data Engineering
Active learning for class imbalance problem
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Learning on the border: active learning in imbalanced data classification
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Skewed Class Distributions and Mislabeled Examples
ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
On the k-NN performance in a challenging scenario of imbalance and overlapping
Pattern Analysis & Applications - Special Issue: Non-parametric distance-based classification techniques and their applications
IKNN: Informative K-Nearest Neighbor Pattern Classification
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
A comparative study on rough set based class imbalance learning
Knowledge-Based Systems
A New Approach to Fuzzy-Rough Nearest Neighbour Classification
RSCTC '08 Proceedings of the 6th International Conference on Rough Sets and Current Trends in Computing
Bioinformatics
Nearest neighbors in high-dimensional data: the emergence and influence of hubs
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
When is 'nearest neighbour' meaningful: A converse theorem and implications
Journal of Complexity
IEEE Transactions on Knowledge and Data Engineering
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Concept learning and the problem of small disjuncts
IJCAI'89 Proceedings of the 11th international joint conference on Artificial intelligence - Volume 1
How does high dimensionality affect collaborative filtering?
Proceedings of the third ACM conference on Recommender systems
Neighbor-weighted K-nearest neighbor for unbalanced text corpus
Expert Systems with Applications: An International Journal
Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection
The Journal of Machine Learning Research
Adaptive k-nearest-neighbor classification using a dynamic number of nearest neighbors
ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
On the existence of obstinate results in vector space models
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data
The Journal of Machine Learning Research
The role of hubness in clustering high-dimensional data
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part I
INSIGHT: efficient and effective instance selection for time-series classification
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Improving k nearest neighbor with exemplar generalization for imbalanced classification
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Class confidence weighted kNN algorithms for imbalanced data sets
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Identifying mislabeled training data with the aid of unlabeled data
Applied Intelligence
A probabilistic approach to nearest-neighbor classification: naive hubness bayesian kNN
Proceedings of the 20th ACM international conference on Information and knowledge management
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Expert Systems with Applications: An International Journal
A probabilistic approach for semi-supervised nearest neighbor classification
Pattern Recognition Letters
Hubness-Aware shared neighbor distances for high-dimensional k-nearest neighbor classification
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
Identification of different types of minority class examples in imbalanced data
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
A Kernel-Based Two-Class Classifier for Imbalanced Data Sets
IEEE Transactions on Neural Networks
Foundation of mining class-imbalanced data
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Novel classifier scheme for imbalanced problems
Pattern Recognition Letters
Local and global scaling reduce hubs in space
The Journal of Machine Learning Research
Hi-index | 0.00 |
Most machine learning tasks involve learning from high-dimensional data, which is often quite difficult to handle. Hubness is an aspect of the curse of dimensionality that was shown to be highly detrimental to k-nearest neighbor methods in high-dimensional feature spaces. Hubs, very frequent nearest neighbors, emerge as centers of influence within the data and often act as semantic singularities. This paper deals with evaluating the impact of hubness on learning under class imbalance with k-nearest neighbor methods. Our results suggest that, contrary to the common belief, minority class hubs might be responsible for most misclassification in many high-dimensional datasets. The standard approaches to learning under class imbalance usually clearly favor the instances of the minority class and are not well suited for handling such highly detrimental minority points. In our experiments, we have evaluated several state-of-the-art hubness-aware kNN classifiers that are based on learning from the neighbor occurrence models calculated from the training data. The experiments included learning under severe class imbalance, class overlap and mislabeling and the results suggest that the hubness-aware methods usually achieve promising results on the examined high-dimensional datasets. The improvements seem to be most pronounced when handling the difficult point types: borderline points, rare points and outliers. On most examined datasets, the hubness-aware approaches improve the classification precision of the minority classes and the recall of the majority class, which helps with reducing the negative impact of minority hubs. We argue that it might prove beneficial to combine the extensible hubness-aware voting frameworks with the existing class imbalanced kNN classifiers, in order to properly handle class imbalanced data in high-dimensional feature spaces.