Mining with rarity: a unifying framework

Authors:
Gary M. Weiss
Affiliations:
AT&T Laboratories, Piscataway, NJ
Venue:
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Year:
2004

Citing 27
Cited 166

Improved Estimates for the Accuracy of Small Disjuncts

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Machine Learning for the Detection of Oil Spills in Satellite Radar Images

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Mining association rules with multiple minimum supports

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Robust Classification for Imprecise Environments

Machine Learning
Mining needle in a haystack: classifying rare classes via two-phase rule induction

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Learning and making decisions when costs and probabilities are both unknown

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Information Retrieval

Information Retrieval
Learning When Negative Examples Abound

ECML '97 Proceedings of the 9th European Conference on Machine Learning
Improving Minority Class Prediction Using Case-Specific Feature Weights

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
AdaCost: Misclassification Cost-Sensitive Boosting

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Evaluating Boosting Algorithms to Classify Rare Classes: Comparison and Improvements

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Brief Introduction to Boosting

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
A Quantitative Study of Small Disjuncts

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
A Mixture-of-Experts Framework for Learning from Imbalanced Data Sets

IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Predicting rare classes: can boosting make any weak learner strong?

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Evolutionary computation

Handbook of data mining and knowledge discovery
Tree Induction for Probability-Based Ranking

Machine Learning
The class imbalance problem: A systematic study

Intelligent Data Analysis
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
A novelty detection approach to classification

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
Lazy decision trees

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Applying both positive and negative selection to supervised learning for anomaly detection

GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation
KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution

IEEE Transactions on Knowledge and Data Engineering
Instance Filtering for entity recognition

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Linear Asymmetric Classifier for cascade detectors

ICML '05 Proceedings of the 22nd international conference on Machine learning
Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem

IEEE Transactions on Knowledge and Data Engineering
A probabilistic classifier system and its application in data mining

Evolutionary Computation
Focusing on non-respondents: Response modeling with novelty detectors

Expert Systems with Applications: An International Journal
Defect prevention in software processes: An action-based approach

Journal of Systems and Software
Rough Sets for Handling Imbalanced Data: Combining Filtering and Rule-based Classifiers

Fundamenta Informaticae - SPECIAL ISSUE ON CONCURRENCY SPECIFICATION AND PROGRAMMING (CS&P 2005) Ruciane-Nide, Poland, 28-30 September 2005
Local decomposition for rare class analysis

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Cost-sensitive boosting for classification of imbalanced data

Pattern Recognition
Video diver: generic video indexing with diverse features

Proceedings of the international workshop on Workshop on multimedia information retrieval
Using classifier ensembles to label spatially disjoint data

Information Fusion
A weighted rough set based method developed for class imbalance learning

Information Sciences: an International Journal
Do unbalanced data have a negative effect on LDA?

Pattern Recognition
Do unbalanced data have a negative effect on LDA?

Pattern Recognition
Learning verb complements for modern greek: Balancing the noisy dataset

Natural Language Engineering
Detection of stock price movements using chance discovery and genetic programming

International Journal of Knowledge-based and Intelligent Engineering Systems - Chance discovery
A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets

Fuzzy Sets and Systems
FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Helping Teachers Handle the Flood of Data in Online Student Discussions

ITS '08 Proceedings of the 9th international conference on Intelligent Tutoring Systems
Selective Pre-processing of Imbalanced Data for Improving Classification Performance

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Imbalanced SVM Learning with Margin Compensation

ISNN '08 Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks
Imbalanced text classification: A term weighting approach

Expert Systems with Applications: An International Journal
A comparative study on rough set based class imbalance learning

Knowledge-Based Systems
Integrating in-process software defect prediction with association mining to discover defect pattern

Information and Software Technology
Handling class imbalance in customer churn prediction

Expert Systems with Applications: An International Journal
Web robot detection: A probabilistic reasoning approach

Computer Networks: The International Journal of Computer and Telecommunications Networking
Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets

International Journal of Approximate Reasoning
PAT: A pattern classification approach to automatic reference oracles for the testing of mesh simplification programs

Journal of Systems and Software
Locally application of cascade generalization for classification problems

Intelligent Decision Technologies
MDS: a novel method for class imbalance learning

Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
Countering imbalanced datasets to improve adverse drug event predictive models in labor and delivery

Journal of Biomedical Informatics
On the influence of an adaptive inference system in fuzzy rule based classification systems for imbalanced data-sets

Expert Systems with Applications: An International Journal
Learning to improve area-under-FROC for imbalanced medical data classification using an ensemble method

ACM SIGKDD Explorations Newsletter
Mind the gaps: weighting the unknown in large-scale one-class collaborative filtering

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Quantification and semi-supervised classification methods for handling changes in class distribution

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Supervised Machine Learning: A Review of Classification Techniques

Proceedings of the 2007 conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies
On multi-class cost-sensitive learning

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
The Needles-in-Haystack Problem

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
A Combination Classification Algorithm Based on Outlier Detection and C4.5

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Rule Learning with Probabilistic Smoothing

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Using language modeling to select useful annotation data

SRWS '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium
Knowledge discovery from imbalanced and noisy data

Data & Knowledge Engineering
Margin calibration in SVM class-imbalanced learning

Neurocomputing
Evolutionary sampling and software quality modeling of high-assurance systems

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
SVMs modeling for highly imbalanced classification

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
Exploratory undersampling for class-imbalance learning

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
An empirical comparison of repetitive undersampling techniques

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Classification of Imbalanced Data Sets by Using the Hybrid Re-sampling Algorithm Based on Isomap

ISICA '09 Proceedings of the 4th International Symposium on Advances in Computation and Intelligence
Support vector self-organizing learning for imbalanced medical data

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Handling class imbalance problem in cultural modeling

ISI'09 Proceedings of the 2009 IEEE international conference on Intelligence and security informatics
On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets

Information Sciences: an International Journal
Facetwise analysis of XCS for problems with class imbalances

IEEE Transactions on Evolutionary Computation
Improving software-quality predictions with data sampling and boosting

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Evolutionary data analysis for the class imbalance problem

Intelligent Data Analysis
COG: local decomposition for rare class analysis

Data Mining and Knowledge Discovery
Online rare events detection

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
A survey on the application of genetic programming to classification

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Improving spamdexing detection via a two-stage classification strategy

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
A study of dynamic meta-learning for failure prediction in large-scale systems

Journal of Parallel and Distributed Computing
First elements on knowledge discovery guided by domain knowledge (KDDK)

CLA'06 Proceedings of the 4th international conference on Concept lattices and their applications
Study on customer churn prediction methods based on multiple classifiers combination

IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
FSVM-CIL: fuzzy support vector machines for class imbalance learning

IEEE Transactions on Fuzzy Systems - Special section on computing with words
How XCS deals with rarities in domains with continuous attributes

Proceedings of the 12th annual conference on Genetic and evolutionary computation
An investigation of real-valued accuracy-based learning classifier systems for electronic fraud detection

Proceedings of the 12th annual conference companion on Genetic and evolutionary computation
Supervised neural network modeling: an empirical investigation into learning from imbalanced data with labeling errors

IEEE Transactions on Neural Networks
Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list

Journal of Intelligent Information Systems
Robust weighted kernel logistic regression in imbalanced and rare events data

Computational Statistics & Data Analysis
Hierarchical service analytics for improving productivity in an enterprise service center

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Improving the performance of Naive Bayes multinomial in e-mail foldering by introducing distribution-based balance of datasets

Expert Systems with Applications: An International Journal
CODE: a data complexity framework for imbalanced datasets

PAKDD'09 Proceedings of the 13th Pacific-Asia international conference on Knowledge discovery and data mining: new frontiers in applied data mining
An empirical study of applying ensembles of heterogeneous classifiers on imperfect data

PAKDD'09 Proceedings of the 13th Pacific-Asia international conference on Knowledge discovery and data mining: new frontiers in applied data mining
Finding minimal rare itemsets and rare association rules

KSEM'10 Proceedings of the 4th international conference on Knowledge science, engineering and management
RAMOBoost: ranked minority oversampling in boosting

IEEE Transactions on Neural Networks
A data mining framework for detecting subscription fraud in telecommunication

Engineering Applications of Artificial Intelligence
Supporting Collaborative Learning and E-Discussions Using Artificial Intelligence Techniques

International Journal of Artificial Intelligence in Education
Ensemble Learning with Active Example Selection for Imbalanced Biomedical Data Classification

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Detecting and ordering salient regions

Data Mining and Knowledge Discovery
Generalization of association rules through disjunction

Annals of Mathematics and Artificial Intelligence
Learning without default: a study of one-class classification and the low-default portfolio problem

AICS'09 Proceedings of the 20th Irish conference on Artificial intelligence and cognitive science
Novel techniques to reduce search space in multiple minimum supports-based frequent pattern mining algorithms

Proceedings of the 14th International Conference on Extending Database Technology
A dynamic over-sampling procedure based on sensitivity for multi-class problems

Pattern Recognition
Inactive learning?: difficulties employing active learning in practice

ACM SIGKDD Explorations Newsletter
An exploration of learning when data is noisy and imbalanced

Intelligent Data Analysis
Borderline over-sampling for imbalanced data classification

International Journal of Knowledge Engineering and Soft Data Paradigms
An empirical evaluation of rotation-based ensemble classifiers for customer churn prediction

Expert Systems with Applications: An International Journal
Genetic algorithms as a pre processing strategy for imbalanced datasets

Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
Good seed makes a good crop: accelerating active learning using language modeling

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Distributed tuning of machine learning algorithms using MapReduce Clusters

Proceedings of the Third Workshop on Large Scale Data Mining: Theory and Applications
ClassySeg: a machine learning approach to automatic stroke segmentation

Proceedings of the Eighth Eurographics Symposium on Sketch-Based Interfaces and Modeling
Addressing the classification with imbalanced data: open problems and new challenges on class distribution

HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part I
Margin-based over-sampling method for learning from imbalanced datasets

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Improving k nearest neighbor with exemplar generalization for imbalanced classification

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Sample subset optimization for classifying imbalanced biological data

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Mining competitor relationships from online news: A network-based approach

Electronic Commerce Research and Applications
Asymmetric Kernel scaling for imbalanced data classification

WILF'11 Proceedings of the 9th international conference on Fuzzy logic and applications
Data preparation techniques for improving rare class prediction

MAMECTIS/NOLASC/CONTROL/WAMUS'11 Proceedings of the 13th WSEAS international conference on mathematical methods, computational techniques and intelligent systems, and 10th WSEAS international conference on non-linear analysis, non-linear systems and chaos, and 7th WSEAS international conference on dynamical systems and control, and 11th WSEAS international conference on Wavelet analysis and multirate systems: recent researches in computational techniques, non-linear systems and control
A learning strategy for highly imbalanced classification

Proceedings of the Third International Conference on Internet Multimedia Computing and Service
Clustering based bagging algorithm on imbalanced data sets

IUKM'11 Proceedings of the 2011 international conference on Integrated uncertainty in knowledge modelling and decision making
Multi-instance multi-label learning

Artificial Intelligence
Drosophila Gene Expression Pattern Annotation through Multi-Instance Multi-Label Learning

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Automatic annotation of protein functional class from sparse and imbalanced data sets

VDMB'06 Proceedings of the First international conference on Data Mining and Bioinformatics
The novelty detection approach for different degrees of class imbalance

ICONIP'06 Proceedings of the 13th international conference on Neural Information Processing - Volume Part II
Improving SVM training by means of NTIL when the data sets are imbalanced

ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
Optimisation and evaluation of random forests for imbalanced datasets

ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
Adjusting and generalizing CBA algorithm to handling class imbalance

Expert Systems with Applications: An International Journal
The class imbalance problem in TLC image classification

ICIAR'06 Proceedings of the Third international conference on Image Analysis and Recognition - Volume Part II
Mining rare association rules in the datasets with widely varying items' frequencies

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
A Kolmogorov-Smirnov statistic based segmentation approach to learning from imbalanced datasets: With application in property refinance prediction

Expert Systems with Applications: An International Journal
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Evolving neural networks with maximum AUC for imbalanced data classification

HAIS'10 Proceedings of the 5th international conference on Hybrid Artificial Intelligence Systems - Volume Part I
Reconciling performance and interpretability in customer churn prediction using ensemble learning based on generalized additive models

Expert Systems with Applications: An International Journal
Relay boost fusion for learning rare concepts in multimedia

CIVR'06 Proceedings of the 5th international conference on Image and Video Retrieval
A novel synthetic minority oversampling technique for imbalanced data set learning

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Preprocessing unbalanced data using support vector machine

Decision Support Systems
Handling concept drift via ensemble and class distribution estimation technique

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
Controlling multi-class error rates for MLP classifier by bias adjustment based on penalty matrix

Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
A new over-sampling approach: Random-SMOTE for learning from imbalanced data sets

KSEM'11 Proceedings of the 5th international conference on Knowledge Science, Engineering and Management
An efficient approach to mine periodic-frequent patterns in transactional databases

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
An empirical study of bagging predictors for imbalanced data with different levels of class distribution

AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence
A combined SMOTE and PSO based RBF classifier for two-class imbalanced problems

Neurocomputing
Save the best for last? The treatment of dominant predictors in financial forecasting

Expert Systems with Applications: An International Journal
Screening nonrandomized studies for medical systematic reviews: A comparative study of classifiers

Artificial Intelligence in Medicine
Estimating conversion rate in display advertising from past erformance data

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Foundation of mining class-imbalanced data

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
A pruning-based approach for searching precise and generalized region for synthetic minority over-sampling

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Increasing the effectiveness of associative classification in terms of class imbalance by using a novel pruning algorithm

Expert Systems with Applications: An International Journal
A novel classification algorithm to noise data

ICSI'12 Proceedings of the Third international conference on Advances in Swarm Intelligence - Volume Part II
Extensions of ant-miner algorithm to deal with class imbalance problem

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Rough Sets for Handling Imbalanced Data: Combining Filtering and Rule-based Classifiers

Fundamenta Informaticae - SPECIAL ISSUE ON CONCURRENCY SPECIFICATION AND PROGRAMMING (CS&P 2005) Ruciane-Nide, Poland, 28-30 September 2005
Time-series data mining

ACM Computing Surveys (CSUR)
BRACID: a comprehensive approach to learning rules from imbalanced data

Journal of Intelligent Information Systems
PUCK: an automated prompting system for smart environments: toward achieving automated prompting--challenges involved

Personal and Ubiquitous Computing
DBFS: An effective Density Based Feature Selection scheme for small sample size and high dimensional imbalanced data sets

Data & Knowledge Engineering
A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets

Knowledge-Based Systems
Over-Sampling from an auxiliary domain

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part I
App recommendation: a contest between satisfaction and temptation

Proceedings of the sixth ACM international conference on Web search and data mining
Determination of Algorithms Making Balance Between Accuracy and Comprehensibility in Churn Prediction Setting

International Journal of Information Retrieval Research
Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches

Knowledge-Based Systems
Cost-Sensitive Learning via Priority Sampling to Improve the Return on Marketing and CRM Investment

Journal of Management Information Systems
Feature selection for high-dimensional imbalanced data

Neurocomputing
An empirical study of learning from imbalanced data

ADC '11 Proceedings of the Twenty-Second Australasian Database Conference - Volume 115
Detection and classification of peer-to-peer traffic: A survey

ACM Computing Surveys (CSUR)
Prediction of body mass index status from voice signals based on machine learning for automated medical applications

Artificial Intelligence in Medicine
Effective detection of sophisticated online banking fraud on extremely imbalanced data

World Wide Web
Imprecise imputation as a tool for solving classification problems with mean values of unobserved features

Advances in Artificial Intelligence
An improved neighborhood-restricted association rule-based recommender system

ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137
Class imbalance and the curse of minority hubs

Knowledge-Based Systems
Variance inflation in high dimensional Support Vector Machines

Pattern Recognition Letters
Causal inference with rare events in large-scale time-series data

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
MEFES: An evolutionary proposal for the detection of exceptions in subgroup discovery. An application to Concentrating Photovoltaic Technology

Knowledge-Based Systems
Training and assessing classification rules with imbalanced data

Data Mining and Knowledge Discovery
Addressing imbalanced classification with instance generation techniques: IPADE-ID

Neurocomputing
On the importance of the validation technique for classification with imbalanced datasets: Addressing covariate shift when data is skewed

Information Sciences: an International Journal
Adjusted F-measure and kernel scaling for imbalanced data learning

Information Sciences: an International Journal
Imbalanced data classification using second-order cone programming support vector machines

Pattern Recognition
Technical Section: A machine learning approach to automatic stroke segmentation

Computers and Graphics
Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset

Multimedia Tools and Applications
A time-efficient breadth-first level-wise lattice-traversal algorithm to discover rare itemsets

Data Mining and Knowledge Discovery
Aggregative quantification for regression

Data Mining and Knowledge Discovery
Imbalanced evolving self-organizing learning

Neurocomputing
Key roles of closed sets and minimal generators in concise representations of frequent patterns

Intelligent Data Analysis
IIvotes ensemble for imbalanced data

Intelligent Data Analysis - Combined Learning Methods and Mining Complex Data
Robust classification of imbalanced data using one-class and two-class SVM-based multiclassifiers

Intelligent Data Analysis - Business Analytics and Intelligent Optimization

Quantified Score

Hi-index	0.02

Visualization

Abstract

Rare objects are often of great interest and great value. Until recently, however, rarity has not received much attention in the context of data mining. Now, as increasingly complex real-world problems are addressed, rarity, and the related problem of imbalanced data, are taking center stage. This article discusses the role that rare classes and rare cases play in data mining. The problems that can result from these two forms of rarity are described in detail, as are methods for addressing these problems. These descriptions utilize examples from existing research. So that this article provides a good survey of the literature on rarity in data mining. This article also demonstrates that rare classes and rare cases are very similar phenomena---both forms of rarity are shown to cause similar problems during data mining and benefit from the same remediation methods.