Countering imbalanced datasets to improve adverse drug event predictive models in labor and delivery

Authors:
L. M. Taft;R. S. Evans;C. R. Shyu;M. J. Egger;N. Chawla;J. A. Mitchell;S. N. Thornton;B. Bray;M. Varner
Affiliations:
Department of Biomedical Informatics, University of Utah Health Sciences Center, School of Medicine, 30 North 1900 East, Salt Lake City, Utah 84132, USA;Department of Biomedical Informatics, University of Utah Health Sciences Center, School of Medicine, 30 North 1900 East, Salt Lake City, Utah 84132, USA and Department of Medical Informatics, Inte ...;Department of Biomedical Informatics, University of Utah Health Sciences Center, School of Medicine, 30 North 1900 East, Salt Lake City, Utah 84132, USA and Informatics Institute, University of Mi ...;Department of Biomedical Informatics, University of Utah Health Sciences Center, School of Medicine, 30 North 1900 East, Salt Lake City, Utah 84132, USA;Department of Computer Science & Engg., University of Notre Dame, USA;Department of Biomedical Informatics, University of Utah Health Sciences Center, School of Medicine, 30 North 1900 East, Salt Lake City, Utah 84132, USA;Department of Biomedical Informatics, University of Utah Health Sciences Center, School of Medicine, 30 North 1900 East, Salt Lake City, Utah 84132, USA and Department of Medical Informatics, Inte ...;Department of Biomedical Informatics, University of Utah Health Sciences Center, School of Medicine, 30 North 1900 East, Salt Lake City, Utah 84132, USA;Department of Obstetrics and Gynecology, University of Utah, School of Medicine, USA
Venue:
Journal of Biomedical Informatics
Year:
2009

Citing 8
Cited 2

Data Mining Using SAS Applications

Data Mining Using SAS Applications
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining

IEEE Transactions on Knowledge and Data Engineering
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Biomedical Informatics: Computer Applications in Health Care and Biomedicine (Health Informatics)

Biomedical Informatics: Computer Applications in Health Care and Biomedicine (Health Informatics)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research

A dynamic over-sampling procedure based on sensitivity for multi-class problems

Pattern Recognition
Review: Knowledge discovery in medicine: Current issue and future trend

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Background: The IOM report, Preventing Medication Errors, emphasizes the overall lack of knowledge of the incidence of adverse drug events (ADE). Operating rooms, emergency departments and intensive care units are known to have a higher incidence of ADE. Labor and delivery (L&D) is an emergency care unit that could have an increased risk of ADE, where reported rates remain low and under-reporting is suspected. Risk factor identification with electronic pattern recognition techniques could improve ADE detection rates. Objective: The objective of the present study is to apply Synthetic Minority Over Sampling Technique (SMOTE) as an enhanced sampling method in a sparse dataset to generate prediction models to identify ADE in women admitted for labor and delivery based on patient risk factors and comorbidities. Results: By creating synthetic cases with the SMOTE algorithm and using a 10-fold cross-validation technique, we demonstrated improved performance of the Naive Bayes and the decision tree algorithms. The true positive rate (TPR) of 0.32 in the raw dataset increased to 0.67 in the 800% over-sampled dataset. Conclusion: Enhanced performance from classification algorithms can be attained with the use of synthetic minority class oversampling techniques in sparse clinical datasets. Predictive models created in this manner can be used to develop evidence based ADE monitoring systems.