Active disks: programming model, algorithms and evaluation
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Clustering gene expression patterns
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
OP-Cluster: Clustering by Tendency in High Dimensional Space
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Towards interactive exploration of gene expression patterns
ACM SIGKDD Explorations Newsletter
Biclustering Algorithms for Biological Data Analysis: A Survey
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Bioinformatics—an introduction for computer scientists
ACM Computing Surveys (CSUR)
Asynchronous and anticipatory filter-stream based parallel algorithm for frequent itemset mining
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Anthill: A Scalable Run-Time Environment for Data Mining Applications
SBAC-PAD '05 Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing
ParTriCluster: A Scalable Parallel Algorithm for Gene Expression Analysis
SBAC-PAD '06 Proceedings of the 18th International Symposium on Computer Architecture and High Performance Computing
Hi-index | 0.00 |
Analyzing gene expression patterns is becoming a highly relevant task in the Bioinformatics area. This analysis makes it possible to determine the behavior patterns of genes under various conditions, a fundamental information for treating diseases, among other applications. A recent advance in this area is the Tricluster algorithm, which is the first algorithm capable of determining 3D clusters (genes × samples × timestamps), that is, groups of genes that behave similarly across samples and timestamps. However, even though biological experiments collect an increasing amount of data to be analyzed and correlated, the triclustering problem remains a bottleneck due to its NP-Completeness, so its parallelization seems to be an essential step towards obtaining feasible solutions. In this work we propose and evaluate the implementation of a parallel version of the Tricluster algorithm using the filter-labeled-stream paradigm supported by the Anthill parallel programming environment. The results show that our parallelization scales well with the data size, being able to handle severe load imbalances that are inherent to the problem. Further more, the parallelization strategy is applicable to any depth-first searches.