Robust discovery of local patterns: subsets and stratification in adverse drug reaction surveillance

Authors:
Johan Hopstadius;G. Niklas Norén
Affiliations:
Uppsala Monitoring Centre, WHO Collaborating Centre for International Drug Monitoring, Uppsala, Sweden;Uppsala Monitoring Centre, WHO Collaborating Centre for International Drug Monitoring & Stockholm University, Uppsala, Sweden
Venue:
Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
Year:
2012

Citing 11
Cited 1

Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Bayesian neural networks with confidence estimations applied to data mining

Computational Statistics & Data Analysis
Empirical bayes screening for multi-item associations

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
An Algorithm for Multi-relational Discovery of Subgroups

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Deviation and Association Patterns for Subgroup Mining in Temporal, Spatial, and Textual Data Bases

RSCTC '98 Proceedings of the First International Conference on Rough Sets and Current Trends in Computing
Mining risk patterns in medical data

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Duplicate detection in adverse drug reaction surveillance

Data Mining and Knowledge Discovery
Discovering Significant Patterns

Machine Learning
Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining

The Journal of Machine Learning Research
Temporal pattern discovery in longitudinal electronic patient records

Data Mining and Knowledge Discovery
Large-scale regression-based pattern discovery: The example of screening the WHO global drug safety database

Statistical Analysis and Data Mining

Digging for drug facts

Communications of the ACM

Quantified Score

Hi-index	0.02

Visualization

Abstract

The identification of unanticipated statistical associations is a core activity in exploratory analysis of high-dimensional biomedical data. Specifically, post-marketing surveillance for harmful effects of medicines relies on effective algorithms to detect associations between drugs and suspected adverse drug reactions. The WHO global individual case safety reports database, VigiBase, holds over six million reports and covers more than ten thousand medicinal products and thousands of distinct medical concepts. It collects data from more than 100 countries across the world and its first reports date back to the late 1960s. Local patterns may not show in database-wide analyses, and many others will vary substantially in strength or direction across data subsets. Still, routine screening of this and similar databases relies on global measures of association. In this paper, we propose a framework to detect local associations and characterise subset variability in high-dimensional data. We use shrinkage observed-to-expected ratios and employ multiple stratification by one or two covariates at a time. We consider subset-specific, stratified-then-pooled adjusted measures, and a novel measure to detect associations that hold in all-but-one subset. We use covariate permutation to select stratification covariates and gauge the vulnerability to spurious associations. Chance findings are a major concern! A naive subgroup analysis yielded more than 50% spurious local associations in VigiBase. To improve on this, we enforce conservative credibility intervals and also look for subset-specific associations that reproduce in at least one additional subset (e.g. two time periods). In addition to 119,500 global associations between drugs and medical events in VigiBase, such robust subgroup analysis uncovered 14,600 local associations at an estimated rate of 2.2% spurious.