A multivariate probabilistic method for comparing two clinical datasets

  • Authors:
  • Yuriy Sverchkov;Shyam Visweswaran;Gilles Clermont;Milos Hauskrecht;Gregory F. Cooper

  • Affiliations:
  • University of Pittsburgh, Pittsburgh, PA, USA;University of Pittsburgh, Pittsburgh, PA, USA;University of Pittsburgh, Pittsburgh, PA, USA;University of Pittsburgh, Pittsburgh, PA, USA;University of Pittsburgh, Pittsburgh, PA, USA

  • Venue:
  • Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a novel method for obtaining a concise and mathematically grounded description of multivariate differences between a pair of clinical datasets. Often data collected under similar circumstances reflect fundamentally different patterns. For example, information about patients undergoing similar treatments in different intensive care units (ICUs), or within the same ICU during different periods, may show systematically different outcomes. In such circumstances, the multivariate probability distributions induced by the datasets would differ in selected ways. To capture the probabilistic relationships, we learn a Bayesian network (BN) from the union of the two datasets. We include an indicator variable that represents the dataset from which a given patient record is obtained. We then extract the relevant conditional distributions from the network by finding the conditional probabilities that differ most when conditioning on the indicator variable. Our work is a form of explanation that bears some similarity to previous work on BN explanation; however, while previous work has mostly focused on justifying inference, our work is aimed at explaining multivariate differences between distributions.