The ParTriCluster algorithm for gene expression analysis

Authors:
Renata Braga Araújo;Guilherme Henrique Trielli Ferreira;Gustavo Henrique Orair;Wagner Meira;Renato Antônio Celso Ferreira;Dorgival Olavo Guedes Neto;Mohammed Javeed Zaki
Affiliations:
Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil;Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil;Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil;Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil;Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil;Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil;Department of Computer Science, Rensselaer Polytechnique Institute, Troy, NY
Venue:
International Journal of Parallel Programming
Year:
2008

Citing 10
Cited 1

Active disks: programming model, algorithms and evaluation

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Clustering gene expression patterns

RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
OP-Cluster: Clustering by Tendency in High Dimensional Space

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Towards interactive exploration of gene expression patterns

ACM SIGKDD Explorations Newsletter
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Bioinformatics—an introduction for computer scientists

ACM Computing Surveys (CSUR)
Asynchronous and anticipatory filter-stream based parallel algorithm for frequent itemset mining

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Anthill: A Scalable Run-Time Environment for Data Mining Applications

SBAC-PAD '05 Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing
ParTriCluster: A Scalable Parallel Algorithm for Gene Expression Analysis

SBAC-PAD '06 Proceedings of the 18th International Symposium on Computer Architecture and High Performance Computing

TriGen: A genetic algorithm to mine triclusters in temporal gene expression data

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Analyzing gene expression patterns is becoming a highly relevant task in the Bioinformatics area. This analysis makes it possible to determine the behavior patterns of genes under various conditions, a fundamental information for treating diseases, among other applications. A recent advance in this area is the Tricluster algorithm, which is the first algorithm capable of determining 3D clusters (genes × samples × timestamps), that is, groups of genes that behave similarly across samples and timestamps. However, even though biological experiments collect an increasing amount of data to be analyzed and correlated, the triclustering problem remains a bottleneck due to its NP-Completeness, so its parallelization seems to be an essential step towards obtaining feasible solutions. In this work we propose and evaluate the implementation of a parallel version of the Tricluster algorithm using the filter-labeled-stream paradigm supported by the Anthill parallel programming environment. The results show that our parallelization scales well with the data size, being able to handle severe load imbalances that are inherent to the problem. Further more, the parallelization strategy is applicable to any depth-first searches.