Supervised multidimensional scaling for visualization, classification, and bipartite ranking

  • Authors:
  • Daniela M. Witten;Robert Tibshirani

  • Affiliations:
  • Department of Statistics, Stanford University, 390 Serra Mall, Stanford CA 94305, USA;Departments of Statistics and Health Research & Policy, Stanford University, 390 Serra Mall, Stanford CA 94305, USA

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2011

Quantified Score

Hi-index 0.03

Visualization

Abstract

Least squares multidimensional scaling (MDS) is a classical method for representing a nxn dissimilarity matrix D. One seeks a set of configuration points z"1,...,z"n@?R^S such that D is well approximated by the Euclidean distances between the configuration points: D"i"j~@?z"i-z"j@?"2. Suppose that in addition to D, a vector of associated binary class labels y@?{1,2}^n corresponding to the n observations is available. We propose an extension to MDS that incorporates this outcome vector. Our proposal, supervised multidimensional scaling (SMDS), seeks a set of configuration points z"1,...,z"n@?R^S such that D"i"j~@?z"i-z"j@?"2, and such that z"i"sz"j"s for s=1,...,S tends to occur when y"iy"j. This results in a new way to visualize the observations. In addition, we show that SMDS leads to a method for the classification of test observations, which can also be interpreted as a solution to the bipartite ranking problem. This method is explored in a simulation study, as well as on a prostate cancer gene expression data set and on a handwritten digits data set.