Gene expression data analysis of human lymphoma using support vector machines and output coding ensembles

  • Authors:
  • Giorgio Valentini

  • Affiliations:
  • Dipartimento di Informatica e Scienze dell'Informazione (DISI), Universití di Genova, via Dodecaneso 35, 16146 Genova, Italy and Isitituto Nazionale di Fisica della Materia (INFM), via Dodeca ...

  • Venue:
  • Artificial Intelligence in Medicine
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The large amount of data generated by DNA microarrays was originally analysed using unsupervised methods, such as clustering or self-organizing maps. Recently supervised methods such as decision trees, dot-product support vector machines (SVM) and multi-layer perceptrons (MLP) have been applied in order to classify normal and tumoural tissues. We propose methods based on non-linear SVM with polynomial and Gaussian kernels, and output coding (OC) ensembles of learning machines to separate normal from malignant tissues, to classify different types of lymphoma and to analyse the role of sets of coordinately expressed genes in carcinogenic processes of lymphoid tissues. Using gene expression data from ''Lymphochip'', a specialised DNA microarray developed at Stanford University School of Medicine, we show that SVM can correctly separate normal from tumoural tissues, and OC ensembles can be successfully used to classify different types of lymphoma. Moreover, we identify a group of coordinately expressed genes related to the separation of two distinct subgroups inside diffuse large B-cell lymphoma (DLBCL), validating a previous Alizadeh's hypothesis about the existence of two distinct diseases inside DLBCL.