Feature and model selection with discriminatory visualization for diagnostic classification of brain tumors

  • Authors:
  • Félix F. González-Navarro;Lluís A. Belanche-Muñoz;Enrique Romero;Alfredo Vellido;Margarida Julií-Sapé;Carles Arús

  • Affiliations:
  • Dept. de Llenguatges i Sistemes Informítics, Universitat Politècnica de Catalunya (UPC), Barcelona, Spain;Dept. de Llenguatges i Sistemes Informítics, Universitat Politècnica de Catalunya (UPC), Barcelona, Spain;Dept. de Llenguatges i Sistemes Informítics, Universitat Politècnica de Catalunya (UPC), Barcelona, Spain;Dept. de Llenguatges i Sistemes Informítics, Universitat Politècnica de Catalunya (UPC), Barcelona, Spain;Centro de Investigación Biomédica en Red en Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Cerdanyola del Vallès, Spain and Grup d'Aplicacions Biomèdiques de la ...;Grup d'Aplicacions Biomèdiques de la RMN (GABRMN), Dept. de Bioquímica i Biología Molecular (BBM), Unitat de Biociències, Universitat Autònoma de Barcelona (UAB), Cerdanyo ...

  • Venue:
  • Neurocomputing
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Machine Learning (ML) and related methods have of late made significant contributions to solving multidisciplinary problems in the field of oncology diagnosis. Human brain tumor diagnosis, in particular, often relies on the use of non-invasive techniques such as Magnetic Resonance Imaging (MRI) and Spectroscopy (MRS). In this paper, MRS data of human brain tumors are analyzed in detail. The high dimensionality of the MR spectra makes difficult both their classification and the interpretation of the obtained results, thus limiting their usability in practical medical settings. The use of dimensionality reduction techniques is therefore advisable. In this work, we apply feature selection methods and several off-the-shelf classifiers on various ^1H-MRS modalities: long and short echo times and an ad hoc combination of both. The introduction of bootstrap resampling techniques permits the obtention of mean performance estimates and their variability. Our experimental findings indicate that the feature selection process enhances the classification performance compared to using the full set of features. We also show that the use of combined information from the different echo times is a better strategy for small numbers of spectral frequencies; however, the use of ever greater numbers of short echo time frequencies permits the obtention of many models with similar performance. The final induced models offer very attractive solutions both in terms of prediction accuracy and number of involved spectral frequencies, which are also amenable to metabolic interpretation. A linear dimensionality-reduction technique that preserves class discrimination capabilities is used for visualizing the data corresponding to the selected frequencies.