Learning from imbalanced data in surveillance of nosocomial infection

Authors:
Gilles Cohen;Mélanie Hilario;Hugo Sax;Stéphane Hugonnet;Antoine Geissbuhler
Affiliations:
Medical Informatics Service, University Hospital of Geneva, Geneva, Switzerland;Artificial Intelligence Laboratory, University of Geneva, Geneva, Switzerland;Department of Internal Medicine, University Hospital of Geneva, Geneva, Switzerland;Department of Internal Medicine, University Hospital of Geneva, Geneva, Switzerland;Medical Informatics Service, University Hospital of Geneva, Geneva, Switzerland
Venue:
Artificial Intelligence in Medicine
Year:
2006

Citing 15
Cited 24

Practical methods of optimization; (2nd ed.)

Practical methods of optimization; (2nd ed.)
C4.5: programs for machine learning

C4.5: programs for machine learning
Support-Vector Networks

Machine Learning
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Improving support vector machine classifiers by modifying kernal functions

Neural Networks
Optimizing classifiers for imbalanced training sets

Proceedings of the 1998 conference on Advances in neural information processing systems II
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Conformal Transformation of Kernel Functions: A Data-Dependent Way to Improve Support Vector Machine Classifiers

Neural Processing Letters
The Case against Accuracy Estimation for Comparing Induction Algorithms

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Combining Statistical Learning with a Knowledge-Based Approach - A Case Study in Intensive Care Monitoring

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
The class imbalance problem: A systematic study

Intelligent Data Analysis
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Data mining on multimedia data

Data mining on multimedia data

An approach to mining the multi-relational imbalanced database

Expert Systems with Applications: An International Journal
Complexity and spectral analysis of the heart rate variability dynamics for distant prediction of paroxysmal atrial fibrillation with artificial intelligence methods

Artificial Intelligence in Medicine
Classification of weld flaws with imbalanced class data

Expert Systems with Applications: An International Journal
An information granulation based data mining approach for classifying imbalanced data

Information Sciences: an International Journal
Fast Target Set Reduction for Large-Scale Protein Function Prediction: A Multi-class Multi-label Machine Learning Approach

WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
On the use of surrounding neighbors for synthetic over-sampling of the minority class

SMO'08 Proceedings of the 8th conference on Simulation, modelling and optimization
Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy

Evolutionary Computation
GUEST EDITORIAL: Intelligent data analysis in medicine-Recent advances

Artificial Intelligence in Medicine
Meta-learning for imbalanced data and classification ensemble in binary classification

Neurocomputing
A data-driven approach to manage the length of stay for appendectomy patients

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Fusing visual and clinical information for lung tissue classification in high-resolution computed tomography

Artificial Intelligence in Medicine
Customer churn prediction --a case study in retail banking

Proceedings of the 2010 conference on Data Mining for Business Applications
Exploring the performance of resampling strategies for the class imbalance problem

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
Exploring discrepancies in findings obtained with the KDD Cup '99 data set

Intelligent Data Analysis
A hierarchical shrinking decision tree for imbalanced datasets

DNCOCO'06 Proceedings of the 5th WSEAS international conference on Data networks, communications and computers
On the effectiveness of preprocessing methods when dealing with different levels of class imbalance

Knowledge-Based Systems
A Kolmogorov-Smirnov statistic based segmentation approach to learning from imbalanced datasets: With application in property refinance prediction

Expert Systems with Applications: An International Journal
Research article: Using ensemble methods to deal with imbalanced data in predicting protein-protein interactions

Computational Biology and Chemistry
Preprocessing unbalanced data using support vector machine

Decision Support Systems
A combined SMOTE and PSO based RBF classifier for two-class imbalanced problems

Neurocomputing
Accurate Prediction of Coronary Artery Disease Using Reliable Diagnosis System

Journal of Medical Systems
Sequence-Based Prediction of DNA-Binding Residues in Proteins with Conservation and Correlation Information

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
GSVM: An SVM for handling imbalanced accuracy between classes inbi-classification problems

Applied Soft Computing
A combined approach to tackle imbalanced data sets

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objective: An important problem that arises in hospitals is the monitoring and detection of nosocomial or hospital acquired infections (NIs). This paper describes a retrospective analysis of a prevalence survey of NIs done in the Geneva University Hospital. Our goal is to identify patients with one or more NIs on the basis of clinical and other data collected during the survey. Methods and material: Standard surveillance strategies are time-consuming and cannot be applied hospital-wide; alternative methods are required. In NI detection viewed as a classification task, the main difficulty resides in the significant imbalance between positive or infected (11%) and negative (89%) cases. To remedy class imbalance, we explore two distinct avenues: (1) a new resampling approach in which both oversampling of rare positives and undersampling of the noninfected majority rely on synthetic cases (prototypes) generated via class-specific subclustering, and (2) a support vector algorithm in which asymmetrical margins are tuned to improve recognition of rare positive cases. Results and conclusion: Experiments have shown both approaches to be effective for the NI detection problem. Our novel resampling strategies perform remarkably better than classical random resampling. However, they are outperformed by asymmetrical soft margin support vector machines which attained a sensitivity rate of 92%, significantly better than the highest sensitivity (87%) obtained via prototype-based resampling. g.