Hierarchical clustering, languages and cancer

  • Authors:
  • Pritha Mahata;Wagner Costa;Carlos Cotta;Pablo Moscato

  • Affiliations:
  • Newcastle Bioinformatics Initiative, School of Electrical Engineering and Computer Science, The University of Newcastle, Callaghan, NSW, Australia;Newcastle Bioinformatics Initiative, School of Electrical Engineering and Computer Science, The University of Newcastle, Callaghan, NSW, Australia;Dept. Lenguajes y Ciencias de la Computación, University of Málaga, ETSI Informática, Málaga, Spain;Newcastle Bioinformatics Initiative, School of Electrical Engineering and Computer Science, The University of Newcastle, Callaghan, NSW, Australia

  • Venue:
  • EuroGP'06 Proceedings of the 2006 international conference on Applications of Evolutionary Computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we introduce a novel objective function for the hierarchical clustering of data from distance matrices, a very relevant task in Bioinformatics. To test the robustness of the method, we test it in two areas: (a) the problem of deriving a phylogeny of languages and (b) subtype cancer classification from microarray data. For comparison purposes, we also consider both the use of ultrametric trees (generated via a two-phase evolutionary approach that creates a large number of hypothesis trees, and then takes a consensus), and the best-known results from the literature. We used a dataset of measured ’separation time’ among 84 Indo-European languages. The hierarchy we produce agrees very well with existing data about these languages across a wide range of levels, and it helps to clarify and raise new hypothesis about the evolution of these languages. Our method also generated a classification tree for the different cancers in the NCI60 microarray dataset (comprising gene expression data for 60 cancer cell lines). In this case, the method seems to support the current belief about the heterogeneous nature of the ovarian, breast and non-small-lung cancer, as opposed to the relative homogeneity of other types of cancer. However, our method reveals a close relationship of the melanoma and CNS cell-lines. This is in correspondence with the fact that metastatic melanoma first appears in central nervous system (CNS).