A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation

Authors:
Arindam Banerjee;Inderjit Dhillon;Joydeep Ghosh;Srujana Merugu;Dharmendra S. Modha
Affiliations:
-;-;-;-;-
Venue:
The Journal of Machine Learning Research
Year:
2007

Citing 0
Cited 36

Predictive discrete latent factor models for large scale dyadic data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for simultaneous co-clustering and learning from complex data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Approximation algorithms for co-clustering

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Coclustering of Human Cancer Microarrays Using Minimum Sum-Squared Residue Coclustering

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
CONSENSUS-BASED ENSEMBLES OF SOFT CLUSTERINGS

Applied Artificial Intelligence
A scalable framework for discovering coherent co-clusters in noisy data

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Regression-based latent factor models

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Pervasive parallelism in data mining: dataflow solution to co-clustering large and sparse Netflix data

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Parameter-Free Hierarchical Co-clustering by n-Ary Splits

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Improving document clustering in a learned concept space

Information Processing and Management: an International Journal
Relational duality: unsupervised extraction of semantic relations between entities on the web

Proceedings of the 19th international conference on World wide web
I/O scalable Bregman co-clustering

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
MIB: Using mutual information for biclustering gene expression data

Pattern Recognition
Automated Hierarchical Density Shaving: A Robust Automated Clustering and Visualization Framework for Large Biological Data Sets

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Approximation algorithms for tensor clustering

ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
SCOAL: A framework for simultaneous co-clustering and learning from complex data

ACM Transactions on Knowledge Discovery from Data (TKDD)
Incremental collaborative filtering via evolutionary co-clustering

Proceedings of the fourth ACM conference on Recommender systems
PAC-Bayesian Analysis of Co-clustering and Beyond

The Journal of Machine Learning Research
Distributed scalable collaborative filtering algorithm

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
On context-aware co-clustering with metadata support

Journal of Intelligent Information Systems
Scalable co-clustering algorithms

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Data transformation for sum squared residue

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
A new formulation of the coVAT algorithm for visual assessment of clustering tendency in rectangular data

International Journal of Intelligent Systems
2012 Special Issue: Enriched topological learning for cluster detection and visualization

Neural Networks
Detecting communities in K-partite K-uniform (hyper)networks

Journal of Computer Science and Technology - Special issue on Community Analysis and Information Recommendation
Constrained co-clustering with non-negative matrix factorisation

International Journal of Business Intelligence and Data Mining
Situation-Aware on mobile phone using co-clustering: algorithms and extensions

IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
SPCF: a stepwise partitioning for collaborative filtering to alleviate sparsity problems

Journal of Information Science
Review: Divergence measures for statistical data processing-An annotated bibliography

Signal Processing
A unified adaptive co-identification framework for high-d expression data

PRIB'12 Proceedings of the 7th IAPR international conference on Pattern Recognition in Bioinformatics
Parameter-less co-clustering for star-structured heterogeneous data

Data Mining and Knowledge Discovery
Social event detection with robust high-order co-clustering

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
CopyCatch: stopping group attacks by spotting lockstep behavior in social networks

Proceedings of the 22nd international conference on World Wide Web
Hierarchical co-clustering: off-line and incremental approaches

Data Mining and Knowledge Discovery
Integrating content-based filtering with collaborative filtering using co-clustering with augmented matrices

Expert Systems with Applications: An International Journal
A Probabilistic Latent Semantic Analysis Model for Coclustering the Mouse Brain Atlas

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Co-clustering, or simultaneous clustering of rows and columns of a two-dimensional data matrix, is rapidly becoming a powerful data analysis technique. Co-clustering has enjoyed wide success in varied application domains such as text clustering, gene-microarray analysis, natural language processing and image, speech and video analysis. In this paper, we introduce a partitional co-clustering formulation that is driven by the search for a good matrix approximation---every co-clustering is associated with an approximation of the original data matrix and the quality of co-clustering is determined by the approximation error. We allow the approximation error to be measured using a large class of loss functions called Bregman divergences that include squared Euclidean distance and KL-divergence as special cases. In addition, we permit multiple structurally different co-clustering schemes that preserve various linear statistics of the original data matrix. To accomplish the above tasks, we introduce a new minimum Bregman information (MBI) principle that simultaneously generalizes the maximum entropy and standard least squares principles, and leads to a matrix approximation that is optimal among all generalized additive models in a certain natural parameter space. Analysis based on this principle yields an elegant meta algorithm, special cases of which include most previously known alternate minimization based clustering algorithms such as kmeans and co-clustering algorithms such as information theoretic (Dhillon et al., 2003b) and minimum sum-squared residue co-clustering (Cho et al., 2004). To demonstrate the generality and flexibility of our co-clustering framework, we provide examples and empirical evidence on a variety of problem domains and also describe novel co-clustering applications such as missing value prediction and compression of categorical data matrices.