Scalable personalized medicine with active learning: detecting seizures with minimum labeled data

  • Authors:
  • Guha Balakrishnan;Zeeshan Syed

  • Affiliations:
  • University of Michigan, Ann Arbor, MI, USA;University of Michigan, Ann Arbor, MI, USA

  • Venue:
  • Proceedings of the 1st ACM International Health Informatics Symposium
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Existing research on personalized medicine has tended to focus on customizing patient care based on genetic data. In contrast, despite the large volumes of physiological data collected from patients in different settings, this information has found only limited use in personalized diagnosis and therapy. This trend can be attributed in part to the difficulty in reviewing the information that is collected. In many cases, clinical applications require the patient-specific behavior to be characterized by an expert before it can be used to develop automated decision-support systems. For large volumes of data, this has proven to be challenging due to demands on human time and skills. In this paper, we explore the use of active learning to allow for scalable personalized medicine based on physiological data. We hypothesize that active learning can substantially reduce the amount of data that must be analyzed by experts while creating automated decision-support systems that are both personalized and highly accurate. We focus our work on a specific clinical application: the detection of epileptic seizures. While patient-specific detectors based on observations of seizures for each patient in long-term electroencephalographic signals has been shown to substantially improve performance over patient-non-specific detectors, this improvement is associated with the need to manually label seizure and non-seizure activity in large amounts of data for each patient. We describe how this present approach can be improved and made more scalable by using a support vector machine-based active learning algorithm. When evaluated on continuous intracranial data from ten patients, our approach was able to reduce the amount of data reviewed by a skilled expert by nearly 95% without any noticeable effect on performance.