On the power of topological kernel in microarray-based detection of cancer

  • Authors:
  • Vilen Jumutc;Pawel Zayakin

  • Affiliations:
  • Riga Technical University, Riga, Latvia;Latvian BioMedical Research & Study Center, Riga, Latvia

  • Venue:
  • IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose a new topological kernel for the microarray-based detection of cancer. During many decades microarrays were a convenient approach in detecting and observing tumor-derived proteins and involved genes. Despite of its biomedical success microarray-based diagnostics is still out of common sense in practical biomedicine due to the lack of robust classification methods that would be capable of correct and insensitive to underlying distribution diagnosis of unseen serum samples. This dismal property of microarray datasets comes from probabilistically infeasible difference between cancer specific and healthy samples where only very small number of (anti)genes has prominent tumor-driven expression values. Kernel methods such as SVM partially address this problem being a "state-of-art" general-purpose classification and regression toolbox. Nevertheless, a purely performed normalization or preprocessing steps could easily bias encoded via SVM kernel similarity measures preventing from proper generalization on unseen data. In this paper, the topological kernel effectively addresses the above mentioned issue by incorporating indirect topological similarities between samples and taking into consideration ranking of every attribute within each sample. The experimental evaluations were performed on different microarray datasets and verify that proposed kernel improves performance on purely conditioned and even very small datasets resulting in statistically significant P-values. Finally we demonstrate that proposed kernel works even better without applying cross-sample normalization and rescaling of input space.