Active learning for sampling in time-series experiments with application to gene expression analysis

Authors:
Rohit Singh;Nathan Palmer;David Gifford;Bonnie Berger;Ziv Bar-Joseph
Affiliations:
Massachusetts Institute of Technology, Cambridge MA;Massachusetts Institute of Technology, Cambridge MA;Massachusetts Institute of Technology, Cambridge MA;Massachusetts Institute of Technology, Cambridge MA;Carnegie Mellon University, Pittsburgh PA
Venue:
ICML '05 Proceedings of the 22nd international conference on Machine learning
Year:
2005

Citing 6
Cited 0

Introduction to signal processing

Introduction to signal processing
Active learning: theory and applications

Active learning: theory and applications
Translation-invariant mixture models for curve clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Identifying periodically expressed transcripts in microarray time series data

Bioinformatics
Model-driven data acquisition in sensor networks

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Budgeted learning of nailve-bayes classifiers

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

Many time-series experiments seek to estimate some signal as a continuous function of time. In this paper, we address the sampling problem for such experiments: determining which time-points ought to be sampled in order to minimize the cost of data collection. We restrict our attention to a growing class of experiments which measure multiple signals at each time-point and where raw materials/observations are archived initially, and selectively analyzed later, this analysis being the more expensive step. We present an active learning algorithm for iteratively choosing time-points to sample, using the uncertainty in the quality of the currently estimated time-dependent curve as the objective function. Using simulated data as well as gene expression data, we show that our algorithm performs well, and can significantly reduce experimental cost without loss of information.