Active sampling for knowledge discovery from biomedical data

  • Authors:
  • Sriharsha Veeramachaneni;Francesca Demichelis;Emanuele Olivetti;Paolo Avesani

  • Affiliations:
  • SRA Division, ITC-IRST, Trento, Italy;SRA Division, ITC-IRST, Trento, Italy;SRA Division, ITC-IRST, Trento, Italy;SRA Division, ITC-IRST, Trento, Italy

  • Venue:
  • PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe work aimed at cost-constrained knowledge discovery in the biomedical domain. To improve the diagnostic/prognostic models of cancer, new biomarkers are studied by researchers that might provide predictive information. Biological samples from monitored patients are selected and analyzed for determining the predictive power of the biomarker. During the process of biomarker evaluation, portions of the samples are consumed, limiting the number of measurements that can be performed. The biological samples obtained from carefully monitored patients, that are well annotated with pathological information, are a valuable resource that must be conserved. We present an active sampling algorithm derived from statistical first principles to incrementally choose the samples that are most informative in estimating the efficacy of the candidate biomarker. We provide empirical evidence on real biomedical data that our active sampling algorithm requires significantly fewer samples than random sampling to ascertain the efficacy of the new biomarker.