Scalable co-clustering algorithms

Authors:
Bongjune Kwon;Hyuk Cho
Affiliations:
Biomedical Engineering, The University of Texas at Austin, Austin, TX;Computer Science, Sam Houston State University, Huntsville, TX
Venue:
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Year:
2010

Citing 9
Cited 1

Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
A Data-Clustering Algorithm on Distributed Memory Multiprocessors

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
P-AutoClass: Scalable Parallel Clustering for Mining Large Data Sets

IEEE Transactions on Knowledge and Data Engineering
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A Scalable Collaborative Filtering Framework Based on Co-Clustering

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
ParRescue: Scalable Parallel Algorithm and Implementation for Biclustering over Large Distributed Datasets

ICDCS '06 Proceedings of the 26th IEEE International Conference on Distributed Computing Systems
A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation

The Journal of Machine Learning Research
Evaluating MapReduce for Multi-core and Multiprocessor Systems

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Coclustering of Human Cancer Microarrays Using Minimum Sum-Squared Residue Coclustering

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Distributed scalable collaborative filtering algorithm

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Co-clustering has been extensively used in varied applications because of its potential to discover latent local patterns that are otherwise unapparent by usual unsupervised algorithms such as k-means Recently, a unified view of co-clustering algorithms, called Bregman co-clustering (BCC), provides a general framework that even contains several existing co-clustering algorithms, thus we expect to have more applications of this framework to varied data types However, the amount of data collected from real-life application domains easily grows too big to fit in the main memory of a single processor machine Accordingly, enhancing the scalability of BCC can be a critical challenge in practice To address this and eventually enhance its potential for rapid deployment to wider applications with larger data, we parallelize all the twelve co-clustering algorithms in the BCC framework using message passing interface (MPI) In addition, we validate their scalability on eleven synthetic datasets as well as one real-life dataset, where we demonstrate their speedup performance in terms of varied parameter settings.