An effective double-bounded tree-connected Isomap algorithm for microarray data classification

  • Authors:
  • C. Orsenigo;C. Vercellis

  • Affiliations:
  • Dept. of Management, Economics and Industrial Engineering, Politecnico di Milano Via Lambruschini 4b, 20156 Milano, Italy;Dept. of Management, Economics and Industrial Engineering, Politecnico di Milano Via Lambruschini 4b, 20156 Milano, Italy

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2012

Quantified Score

Hi-index 0.10

Visualization

Abstract

Isometric mapping (Isomap) is a popular nonlinear dimensionality reduction technique which has shown high potential in visualization and classification. However, it appears sensitive to noise or scarcity of observations. This inadequacy may hinder its application for the classification of microarray data, in which the expression levels of thousands of genes in a few normal and tumor sample tissues are measured. In this paper we propose a double-bounded tree-connected variant of Isomap, aimed at being more robust to noise and outliers when used for classification and also computationally more efficient. It differs from the original Isomap in the way the neighborhood graph is generated: in the first stage we apply a double-bounding rule that confines the search to at most k nearest neighbors contained within an @e-radius hypersphere; the resulting subgraphs are then joined by computing a minimum spanning tree among the connected components. We therefore achieve a connected graph without unnaturally inflating the values of k and @e. The computational experiences show that the new method performs significantly better in terms of accuracy with respect to Isomap, k-edge-connected Isomap and the direct application of support vector machines to data in the input space, consistently across seven microarray datasets considered in our tests.