KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution

Authors:
Gang Wu;Edward Y. Chang
Affiliations:
-;IEEE
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2005

Citing 20
Cited 48

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
The nature of statistical learning theory

The nature of statistical learning theory
Bagging predictors

Machine Learning
Geometry and invariance in kernel based methods

Advances in kernel methods
Improving support vector machine classifiers by modifying kernal functions

Neural Networks
Optimizing classifiers for imbalanced training sets

Proceedings of the 1998 conference on Advances in neural information processing systems II
Support vector machine active learning for image retrieval

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Adaptive Fraud Detection

Data Mining and Knowledge Discovery
Support Vector Machines for Classification in Nonstandard Situations

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Improving Minority Class Prediction Using Case-Specific Feature Weights

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Incorporating prior knowledge with weighted margin support vector machines

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
Solving multiclass learning problems via error-correcting output codes

Journal of Artificial Intelligence Research
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition

Quantifying trends accurately despite classifier error and class imbalance

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning concepts from large scale imbalanced data sets using support cluster machines

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
An Evaluation of the Robustness of MTS for Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
Floatcascade learning for fast imbalanced web mining

Proceedings of the 17th international conference on World Wide Web
AdaBoost with SVM-based component classifiers

Engineering Applications of Artificial Intelligence
Quantifying counts and costs via classification

Data Mining and Knowledge Discovery
Using granular computing model to induce scheduling knowledge in dynamic manufacturing environments

International Journal of Computer Integrated Manufacturing
Letters: A biased minimax probability machine-based scheme for relevance feedback in image retrieval

Neurocomputing
A Method to Classify Data by Fuzzy Rule Extraction from Imbalanced Datasets

Proceedings of the 2006 conference on Artificial Intelligence Research and Development
Application of artificial intelligence to operational real-time clear-air turbulence prediction

IAAI'08 Proceedings of the 20th national conference on Innovative applications of artificial intelligence - Volume 3
Margin calibration in SVM class-imbalanced learning

Neurocomputing
On strategies for imbalanced text classification using SVM: A comparative study

Decision Support Systems
SVMs modeling for highly imbalanced classification

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets

Information Sciences: an International Journal
AdaOUBoost: adaptive over-sampling and under-sampling to boost the concept learning in large scale imbalanced data sets

Proceedings of the international conference on Multimedia information retrieval
Mix-ratio sampling: Classifying multiclass imbalanced mouse brain images using support vector machine

Expert Systems with Applications: An International Journal
Rectangular basis functions applied to imbalanced datasets

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
An asymmetric classifier based on partial least squares

Pattern Recognition
FSVM-CIL: fuzzy support vector machines for class imbalance learning

IEEE Transactions on Fuzzy Systems - Special section on computing with words
Analysis of an evolutionary RBFN design algorithm, CO2RBFN, for imbalanced data sets

Pattern Recognition Letters
The forecasting model based on modified SVRM and PSO penalizing Gaussian noise

Expert Systems with Applications: An International Journal
CODE: a data complexity framework for imbalanced datasets

PAKDD'09 Proceedings of the 13th Pacific-Asia international conference on Knowledge discovery and data mining: new frontiers in applied data mining
Utterance partitioning with acoustic vector resampling for GMM-SVM speaker verification

Speech Communication
RAMOBoost: ranked minority oversampling in boosting

IEEE Transactions on Neural Networks
Bayesian decision theory for support vector machines: Imbalance measurement and feature optimization

Expert Systems with Applications: An International Journal
Class information adapted kernel for support vector machine

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: models and applications - Volume Part II
Borderline over-sampling for imbalanced data classification

International Journal of Knowledge Engineering and Soft Data Paradigms
Support vector machines using Bayesian-based approach in the issue of unbalanced classifications

Expert Systems with Applications: An International Journal
An Application of Artificial Immune Recognition System for Prediction of Diabetes Following Gestational Diabetes

Journal of Medical Systems
Asymmetric Kernel scaling for imbalanced data classification

WILF'11 Proceedings of the 9th international conference on Fuzzy logic and applications
A learning strategy for highly imbalanced classification

Proceedings of the Third International Conference on Internet Multimedia Computing and Service
Clustering based bagging algorithm on imbalanced data sets

IUKM'11 Proceedings of the 2011 international conference on Integrated uncertainty in knowledge modelling and decision making
FISA: feature-based instance selection for imbalanced text classification

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics

Expert Systems with Applications: An International Journal
z-SVM: an SVM for improved classification of imbalanced data

AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
A normal distribution-based over-sampling approach to imbalanced data classification

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
A conversation with Dr. Edward Y. Chang

ACM SIGKDD Explorations Newsletter
Improving ANNs performance on unbalanced data with an AUC-Based learning algorithm

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
A hybrid PSO-FSVM model and its application to imbalanced classification of mammograms

ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
The fuzzy Laplacianclassifier

Neurocomputing
EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling

Pattern Recognition
Class imbalance and the curse of minority hubs

Knowledge-Based Systems
Adjusted F-measure and kernel scaling for imbalanced data learning

Information Sciences: an International Journal
Speaker state classification based on fusion of asymmetric simple partial least squares (SIMPLS) and support vector machines

Computer Speech and Language
Predicting minority class for suspended particulate matters level by extreme learning machine

Neurocomputing
A Fast Multiclass Classification Algorithm Based on Cooperative Clustering

Neural Processing Letters
GSVM: An SVM for handling imbalanced accuracy between classes inbi-classification problems

Applied Soft Computing
Imbalanced evolving self-organizing learning

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

An imbalanced training data set can pose serious problems for many real-world data mining tasks that employ SVMs to conduct supervised learning. In this paper, we propose a kernel-boundary-alignment algorithm, which considers THE training data imbalance as prior information to augment SVMs to improve class-prediction accuracy. Using a simple example, we first show that SVMs can suffer from high incidences of false negatives when the training instances of the target class are heavily outnumbered by the training instances of a nontarget class. The remedy we propose is to adjust the class boundary by modifying the kernel matrix, according to the imbalanced data distribution. Through theoretical analysis backed by empirical study, we show that our kernel-boundary-alignment algorithm works effectively on several data sets.