Clustering and visualization approaches for human cell cycle gene expression data analysis

  • Authors:
  • F. Napolitano;G. Raiconi;R. Tagliaferri;A. Ciaramella;A. Staiano;G. Miele

  • Affiliations:
  • Department of Mathematics and Informatics, University of Salerno, I-84084, via Ponte don Melillo, Fisciano (SA), Italy;Department of Mathematics and Informatics, University of Salerno, I-84084, via Ponte don Melillo, Fisciano (SA), Italy;Department of Mathematics and Informatics, University of Salerno, I-84084, via Ponte don Melillo, Fisciano (SA), Italy;Department of Applied Sciences, University of Naples “Parthenope”, I-80133, via A. de Gasperi 5, Napoli, Italy;Department of Applied Sciences, University of Naples “Parthenope”, I-80133, via A. de Gasperi 5, Napoli, Italy;Department of Physical Sciences, University of Naples, I-80136, via Cintia 6, Napoli, Italy and INFN, Unit of Naples, Napoli, Italy

  • Venue:
  • International Journal of Approximate Reasoning
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work a comprehensive multi-step machine learning data mining and data visualization framework is introduced. The different steps of the approach are: preprocessing, clustering, and visualization. A preprocessing based on a Robust Principal Component Analysis Neural Network for feature extraction of unevenly sampled data is used. Then a Probabilistic Principal Surfaces approach combined with an agglomerative procedure based on Fisher's and Negentropy information is applied for clustering and labeling purposes. Furthermore, a Multi-Dimensional Scaling approach for a 2-dimensional data visualization of the clustered and labeled data is used. The method, which provides a user-friendly visualization interface in both 2 and 3 dimensions, can work on noisy data with missing points, and represents an automatic procedure to get, with no a priori assumptions, the number of clusters present in the data. Analysis and identification of genes periodically expressed in a human cancer cell line (HeLa) using cDNA microarrays is carried out as test case.