The ParTriCluster algorithm for gene expression analysis

  • Authors:
  • Renata Braga Araújo;Guilherme Henrique Trielli Ferreira;Gustavo Henrique Orair;Wagner Meira;Renato Antônio Celso Ferreira;Dorgival Olavo Guedes Neto;Mohammed Javeed Zaki

  • Affiliations:
  • Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil;Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil;Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil;Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil;Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil;Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil;Department of Computer Science, Rensselaer Polytechnique Institute, Troy, NY

  • Venue:
  • International Journal of Parallel Programming
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Analyzing gene expression patterns is becoming a highly relevant task in the Bioinformatics area. This analysis makes it possible to determine the behavior patterns of genes under various conditions, a fundamental information for treating diseases, among other applications. A recent advance in this area is the Tricluster algorithm, which is the first algorithm capable of determining 3D clusters (genes × samples × timestamps), that is, groups of genes that behave similarly across samples and timestamps. However, even though biological experiments collect an increasing amount of data to be analyzed and correlated, the triclustering problem remains a bottleneck due to its NP-Completeness, so its parallelization seems to be an essential step towards obtaining feasible solutions. In this work we propose and evaluate the implementation of a parallel version of the Tricluster algorithm using the filter-labeled-stream paradigm supported by the Anthill parallel programming environment. The results show that our parallelization scales well with the data size, being able to handle severe load imbalances that are inherent to the problem. Further more, the parallelization strategy is applicable to any depth-first searches.