Simple and effective visual models for gene expression cancer diagnostics

  • Authors:
  • Gregor Leban;Minca Mramor;Ivan Bratko;Blaz Zupan

  • Affiliations:
  • University of Ljubljana, Tržaška 25, Ljubljana, Slovenia;University of Ljubljana, Tržaška 25, Ljubljana, Slovenia;University of Ljubljana, Tržaška 25, Ljubljana, Slovenia;University of Ljubljana, Tržaška 25, Ljubljana, Slovenia

  • Venue:
  • Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
  • Year:
  • 2005
  • Autonomous visualization

    PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the paper we show that diagnostic classes in cancer gene expression data sets, which most often include thousands of features (genes), may be effectively separated with simple two-dimensional plots such as scatterplot and radviz graph. The principal innovation proposed in the paper is a method called VizRank, which is able to score and identify the best among possibly millions of candidate projections for visualizations. Compared to recently much applied techniques in the field of cancer genomics that include neural networks, support vector machines and various ensemble-based approaches, VizRank is fast and finds visualization models that can be easily examined and interpreted by domain experts. Our experiments on a number of gene expression data sets show that VizRank was always able to find data visualizations with a small number of (two to seven) genes and excellent class separation. In addition to providing grounds for gene expression cancer diagnosis, VizRank and its visualizations also identify small sets of relevant genes, uncover interesting gene interactions and point to outliers and potential misclassifications in cancer data sets.