Active learning for microarray data

  • Authors:
  • D. Vogiatzis;N. Tsapatsoulis

  • Affiliations:
  • University of Cyprus, Department of Computer Science, Kallipoleos 75, CY-1678 Nicosia, Cyprus;University of Peloponnese, Department of Telecommunications Science and Technology, End of Karaiskaki Street, 22100 Tripoli, Greece

  • Venue:
  • International Journal of Approximate Reasoning
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In supervised learning it is assumed that it is straightforward to obtain labeled data. However, in reality labeled data can be scarce or expensive to obtain. Active learning (AL) is a way to deal with the above problem by asking for the labels of the most ''informative'' data points. We propose an AL method based on a metric of classification confidence computed on a feature subset of the original feature space which pertains especially to the large number of dimensions (i.e. examined genes) of microarray experiments. DNA microarray expression experiments permit the systematic study of the correlation of the expression of thousands of genes. Feature selection is critical in the algorithm because it enables faster and more robust retraining of the classifier. The approach that is followed for feature selection is a combination of a variance measure and a genetic algorithm. We have applied the proposed method on DNA microarray data sets with encouraging results. In particular we studied data sets concerning: small round blue cell tumours (4 types), Leukemia (2 types), lung cancer (2 types) and prostate cancer (healthy, unhealthy)