Short communication: A novel parallelization approach for hierarchical clustering

Authors:
Z. Du;F. Lin
Affiliations:
BioInformatics Research Centre, Nanyang Technological University, Nanyang Avenue, Singapore 639798, Singapore;BioInformatics Research Centre, Nanyang Technological University, Nanyang Avenue, Singapore 639798, Singapore
Venue:
Parallel Computing
Year:
2005

Citing 4
Cited 6

Relaxed heaps: an alternative to Fibonacci heaps with applications to parallel computation

Communications of the ACM
Efficiency of hierarchic agglomerative clustering using the ICL distributed array processor

Journal of Documentation
Parallel algorithms for hierarchical clustering

Parallel Computing
Open source clustering software

Bioinformatics

Optimal implementations of UPGMA and other common clustering algorithms

Information Processing Letters
Parallel Clustering Algorithm for Large Data Sets with Applications in Bioinformatics

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Clustering performance data efficiently at massive scales

Proceedings of the 24th ACM International Conference on Supercomputing
eXploratory K-Means: A new simple and efficient algorithm for gene clustering

Applied Soft Computing
p-PIC: Parallel power iteration clustering for big data

Journal of Parallel and Distributed Computing
Evolutionary k-means for distributed data sets

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identification of groups of genes that manifest similar expression patters is a key step in the analysis of gene expression data. Hierarchical clustering is developed for that purpose. A fundamental problem with the previous implementations of this clustering method is its limitation to handle large data sets within a reasonable time and memory resources. In this paper, we present a parallel approach for solving this problem. Implementation of the parallel algorithm is illustrated on data from high dimensional microarray experiments related to the gene expression in cancerous disease and Arabidopsis seedling growth. They show considerable reduction in computational time and inter-node communication overhead, especially for large data sets.