Extreme re-balancing for SVMs: a case study

Authors:
Bhavani Raskutti;Adam Kowalczyk
Affiliations:
Telstra Corporation, Clayton, Victoria, Australia;Telstra Corporation, Clayton, Victoria, Australia
Venue:
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Year:
2004

Citing 17
Cited 41

A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory

The nature of statistical learning theory
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems

Machine Learning
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Induction of Decision Trees

Machine Learning
Learning When Negative Examples Abound

ECML '97 Proceedings of the 9th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
The genomics of a signaling pathway: a KDD Cup challenge task

ACM SIGKDD Explorations Newsletter
One class SVM for yeast regulation prediction

ACM SIGKDD Explorations Newsletter
On Evaluating Performance of Classifiers for Rare Classes

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
One-class svms for document classification

The Journal of Machine Learning Research
The class imbalance problem: A systematic study

Intelligent Data Analysis
A novelty detection approach to classification

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
The foundations of cost-sensitive learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Predicting the product purchase patterns of corporate customers

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Focusing on non-respondents: Response modeling with novelty detectors

Expert Systems with Applications: An International Journal
One-class document classification via Neural Networks

Neurocomputing
Learning on the border: active learning in imbalanced data classification

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets

Fuzzy Sets and Systems
FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Classification of Anti-learnable Biological and Synthetic Data

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Imbalanced text classification: A term weighting approach

Expert Systems with Applications: An International Journal
A New Performance Evaluation Method for Two-Class Imbalanced Problems

SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Incremental data-driven learning of a novelty detection model for one-class classification with application to high-dimensional noisy data

Machine Learning
Mind the gaps: weighting the unknown in large-scale one-class collaborative filtering

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
SVMs modeling for highly imbalanced classification

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
Parameter optimization of Kernel-based one-class classifier on imbalance text learning

PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Multi-modality in one-class classification

Proceedings of the 19th international conference on World wide web
Cost-sensitive supported vector learning to rank imbalanced data set

ICIC'09 Proceedings of the Intelligent computing 5th international conference on Emerging intelligent computing technology and applications
FSVM-CIL: fuzzy support vector machines for class imbalance learning

IEEE Transactions on Fuzzy Systems - Special section on computing with words
Improving the performance of Naive Bayes multinomial in e-mail foldering by introducing distribution-based balance of datasets

Expert Systems with Applications: An International Journal
Learning without default: a study of one-class classification and the low-default portfolio problem

AICS'09 Proceedings of the 20th Irish conference on Artificial intelligence and cognitive science
A dynamic over-sampling procedure based on sensitivity for multi-class problems

Pattern Recognition
Authorship attribution with latent Dirichlet allocation

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Margin-based over-sampling method for learning from imbalanced datasets

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Novelty detection for the inspection of light-emitting diodes

Expert Systems with Applications: An International Journal
The novelty detection approach for different degrees of class imbalance

ICONIP'06 Proceedings of the 13th international conference on Neural Information Processing - Volume Part II
FISA: feature-based instance selection for imbalanced text classification

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
An analysis of the anti-learning phenomenon for the class symmetric polyhedron

ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
A Kolmogorov-Smirnov statistic based segmentation approach to learning from imbalanced datasets: With application in property refinance prediction

Expert Systems with Applications: An International Journal
Machine learning techniques and mammographic risk assessment

IWDM'10 Proceedings of the 10th international conference on Digital Mammography
Predicting deleterious non-synonymous single nucleotide polymorphisms in signal peptides based on hybrid sequence attributes

Computational Biology and Chemistry
Research article: Using ensemble methods to deal with imbalanced data in predicting protein-protein interactions

Computational Biology and Chemistry
Parameter estimation of one-class SVM on imbalance text classification

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence
Mixed-sampling approach to unbalanced data distributions: a case study involving Leukemia's document profiling

WSEAS Transactions on Information Science and Applications
PUCK: an automated prompting system for smart environments: toward achieving automated prompting--challenges involved

Personal and Ubiquitous Computing
DBFS: An effective Density Based Feature Selection scheme for small sample size and high dimensional imbalanced data sets

Data & Knowledge Engineering
NLP-driven constructive learning for filtering an IR document stream

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
A new framework for optimal classifier design

Pattern Recognition
Novel classifier scheme for imbalanced problems

Pattern Recognition Letters
Classification and outlier detection based on topic based pattern synthesis

MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Adjusted F-measure and kernel scaling for imbalanced data learning

Information Sciences: an International Journal
Imbalanced data classification using second-order cone programming support vector machines

Pattern Recognition
Robust classification of imbalanced data using one-class and two-class SVM-based multiclassifiers

Intelligent Data Analysis - Business Analytics and Intelligent Optimization

Quantified Score

Hi-index	0.01

Visualization

Abstract

There are many practical applications where learning from single class examples is either, the only possible solution, or has a distinct performance advantage. The first case occurs when obtaining examples of a second class is difficult, e.g., classifying sites of "interest" based on web accesses. The second situation is exemplified by the gene knock-out experiments for understanding Aryl Hydrocarbon Receptor signalling pathway that provided the data for the second task of the KDD 2002 Cup, where minority one-class SVMs significantly outperform models learnt using examples from both classes.This paper explores the limits of supervised learning of a two class discrimination from data with heavily unbalanced class proportions. We focus on the case of supervised learning with support vector machines. We consider the impact of both sampling and weighting imbalance compensation techniques and then extend the balancing to extreme situations when one of the classes is ignored completely and the learning is accomplished using examples from a single class.Our investigation with the data for KDD 2002 Cup as well as text benchmarks such as Reuters Newswire shows that there is a consistent pattern of performance differences between one and two-class learning for all SVMs investigated, and these patterns persist even with aggressive dimensionality reduction through automated feature selection. Using insight gained from the above analysis, we generate synthetic data showing similar pattern of performance.