2008 Special Issue: Interactive data analysis and clustering of genomic data

  • Authors:
  • A. Ciaramella;S. Cocozza;F. Iorio;G. Miele;F. Napolitano;M. Pinelli;G. Raiconi;R. Tagliaferri

  • Affiliations:
  • Department of Applied Sciences University of Naples Parthenope, I-80143, Centro Direzionale, Naples, Italy;CDGU, Department of Cellular and Molecular Biology and Pathology "L. Califano" University "Federico II", Naples, Italy;Telethon Institute of Genetics and Medicine, Via P. Castellino, 111 80131, Naples, Italy;Department of Physical Sciences, University of Naples, I-80126, via Cintia 6, Naples, Italy;Department of Mathematics and Informatics, University of Salerno, via Ponte Don Melillo, 84084 Fisciano (SA), Italy;CDGU, Department of Cellular and Molecular Biology and Pathology "L. Califano" University "Federico II", Naples, Italy;Department of Mathematics and Informatics, University of Salerno, via Ponte Don Melillo, 84084 Fisciano (SA), Italy;Department of Mathematics and Informatics, University of Salerno, via Ponte Don Melillo, 84084 Fisciano (SA), Italy

  • Venue:
  • Neural Networks
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work a new clustering approach is used to explore a well- known dataset [Whitfield, M. L., Sherlock, G., Saldanha, A. J., Murray, J. I., Ball, C. A., Alexander, K. E., et al. (2002). Molecular biology of the cell: Vol. 13. Identification of genes periodically expressed in the human cell cycle and their expression in tumors (pp. 1977-2000)] of time dependent gene expression profiles in human cell cycle. The approach followed by us is realized with a multi-step procedure: after preprocessing, parameters are chosen by using data sub sampling and stability measures; for any used model, several different clustering solutions are obtained by random initialization and are selected basing on a similarity measure and a figure of merit; finally the selected solutions are tuned by evaluating a reliability measure. Three different models for clustering, K-means, Self-organizing Maps and Probabilistic Principal Surfaces are compared. Comparative analysis is carried out by considering: similarity between best solutions obtained through the three methods, absolute distortion value and validation through the use of Gene Ontology (GO) annotations. The GO annotations are used to give significance to the obtained clusters and to compare the results with those obtained in the work cited above.