Comparative study on proximity indices for cluster analysis of gene expression time series

  • Authors:
  • Ivan G. Costa;Francisco A. T. de Carvalho;Marcílio C. P. de Souto

  • Affiliations:
  • (Correspd.) Centre of Informatics, Federal University of Pernambuco, Av. Professor Luis Freire, s/n, 50740-540, Recife, PE, Brazil. E-mail: {igcf, fatc}@cin.ufpe.br;Centre of Informatics, Federal University of Pernambuco, Av. Professor Luis Freire, s/n, 50740-540, Recife, PE, Brazil. E-mail: {igcf, fatc}@cin.ufpe.br;ICMC, University of São Paulo, Caixa Postal 668, 13560-970, São Carlos, SP, Brazil. E-mail: marcilio@icmc.usp.br

  • Venue:
  • Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - SBRN'02
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the computational analysis of gene expression time series, the main aspect in finding co-expressed genes is the proximity (similarity or dissimilarity) index used in the clustering method. In this context, the proximity indices should find genes that have similar patterns of expression change through time. There are a number of proximity indices used for such a task. However, the majority of these works has given emphasis on the biological results, with no critical evaluation of the suitability of the proximity index used. As a consequence, so far, there is no validity study on which proximity indices are more suitable for the analysis of gene expression time series. Based on this, a comparative study of proximity indices broadly used in the literature is accomplished in this work. More specifically, versions of three distinct proximity indices are compared: Euclidean distance, Pearson correlation and angular separation. In order to evaluate the results, an adaptation of the k-fold cross-validation procedure suitable for unsupervised methods is used. The accuracies of the proximity indices are assessed with the use of an external index, which measures the agreement between the clustering results and gene annotation data.