Simultaneous classification and relevant feature identification in high-dimensional spaces: application to molecular profiling data

  • Authors:
  • C. Bhattacharyya;L. R. Grate;A. Rizki;D. Radisky;F. J. Molina;M. I. Jordan;M. J. Bissell;I. S. Mian

  • Affiliations:
  • Division of Computer Science, University of California Berkeley, Berkeley, CA and Department of CSA, Indian Institute of Science, Bangalore 560012, India;Lawrence Berkeley National Laboratory, Life Sciences Division, Berkeley, CA;Lawrence Berkeley National Laboratory, Life Sciences Division, Berkeley, CA;Lawrence Berkeley National Laboratory, Life Sciences Division, Berkeley, CA;Lawrence Berkeley National Laboratory, Life Sciences Division, Berkeley, CA and Department of Mathematics, University of California Santa Cruz, Santa Cruz, CA;Division of Computer Science, University of California Berkeley, Berkeley, CA and Department of Statistics, University of California Berkeley, Berkeley, CA;Lawrence Berkeley National Laboratory, Life Sciences Division, Berkeley, CA;Lawrence Berkeley National Laboratory, Life Sciences Division, Berkeley, CA

  • Venue:
  • Signal Processing - Special issue: Genomic signal processing
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Molecular profiling technologies monitor many thousands of transcripts, proteins, metabolites or other species concurrently in a biological sample of interest. Given such high-dimensional data for different types of samples, classification methods aim to assign specimens to known categories. Relevant feature identification methods seek to define a subset of molecules that differentiate the samples. This work describes LIKNON, a specific implementation of a statistical approach for creating a classifier and identifying a small number of relevant features simultaneously. Given two-class data, LIKNON estimates a sparse linear classifier by exploiting the simple and well-known property that minimising an L1 norm (via linear programming) yields a sparse hyperplane. It performs well when used for retrospective analysis of three cancer biology profiling data sets, (i) small, round, blue cell tumour transcript profiles from tumour biopsies and cell lines, (ii) sporadic breast carcinoma transcript profiles from patients with distant metastases