Simple and effective visual models for gene expression cancer diagnostics

Authors:
Gregor Leban;Minca Mramor;Ivan Bratko;Blaz Zupan
Affiliations:
University of Ljubljana, Tržaška 25, Ljubljana, Slovenia;University of Ljubljana, Tržaška 25, Ljubljana, Slovenia;University of Ljubljana, Tržaška 25, Ljubljana, Slovenia;University of Ljubljana, Tržaška 25, Ljubljana, Slovenia
Venue:
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Year:
2005

Citing 4
Cited 1

DNA visual and analytic data mining

VIS '97 Proceedings of the 8th conference on Visualization '97
Orange: from experimental machine learning to interactive data mining

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis

Bioinformatics
Multidimensional support vector machines for visualization of gene expression data

Bioinformatics

Autonomous visualization

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the paper we show that diagnostic classes in cancer gene expression data sets, which most often include thousands of features (genes), may be effectively separated with simple two-dimensional plots such as scatterplot and radviz graph. The principal innovation proposed in the paper is a method called VizRank, which is able to score and identify the best among possibly millions of candidate projections for visualizations. Compared to recently much applied techniques in the field of cancer genomics that include neural networks, support vector machines and various ensemble-based approaches, VizRank is fast and finds visualization models that can be easily examined and interpreted by domain experts. Our experiments on a number of gene expression data sets show that VizRank was always able to find data visualizations with a small number of (two to seven) genes and excellent class separation. In addition to providing grounds for gene expression cancer diagnosis, VizRank and its visualizations also identify small sets of relevant genes, uncover interesting gene interactions and point to outliers and potential misclassifications in cancer data sets.