Optimized multilayer perceptrons for molecular classification and diagnosis using genomic data

  • Authors:
  • Zuyi Wang;Yue Wang;Jianhua Xuan;Yibin Dong;Marina Bakay;Yuanjian Feng;Robert Clarke;Eric P. Hoffman

  • Affiliations:
  • Center for Genetic Medicine, Children's National Medical Center Washington, DC 20010, USA;The Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University Arlington, VA 22203, USA;Department of Electrical Engineering and Computer Science, The Catholic University of America Washington, DC 20064, USA;The Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University Arlington, VA 22203, USA;Center for Genetic Medicine, Children's National Medical Center Washington, DC 20010, USA;The Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University Arlington, VA 22203, USA;Departments of Oncology, Physiology and Biophysics, Lombardi Comprehensive Cancer Center, Georgetown University Washington, DC 20007, USA;Center for Genetic Medicine, Children's National Medical Center Washington, DC 20010, USA

  • Venue:
  • Bioinformatics
  • Year:
  • 2006

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: Multilayer perceptrons (MLP) represent one of the widely used and effective machine learning methods currently applied to diagnostic classification based on high-dimensional genomic data. Since the dimensionalities of the existing genomic data often exceed the available sample sizes by orders of magnitude, the MLP performance may degrade owing to the curse of dimensionality and over-fitting, and may not provide acceptable prediction accuracy. Results: Based on Fisher linear discriminant analysis, we designed and implemented an MLP optimization scheme for a two-layer MLP that effectively optimizes the initialization of MLP parameters and MLP architecture. The optimized MLP consistently demonstrated its ability in easing the curse of dimensionality in large microarray datasets. In comparison with a conventional MLP using random initialization, we obtained significant improvements in major performance measures including Bayes classification accuracy, convergence properties and area under the receiver operating characteristic curve (Az). Supplementary information: The Supplementary information is available on http://www.cbil.ece.vt.edu/publications.htm Contact: yuewang@vt.edu