SCOAL: A framework for simultaneous co-clustering and learning from complex data

Authors:
Meghana Deodhar;Joydeep Ghosh
Affiliations:
University of Texas at Austin, Austin, TX;University of Texas at Austin, Austin, TX
Venue:
ACM Transactions on Knowledge Discovery from Data (TKDD)
Year:
2010

Citing 16
Cited 2

A view of the EM algorithm that justifies incremental, sparse, and other variants

Learning in graphical models
An algorithmic framework for performing collaborative filtering

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
An Intelligent Clustering Forecasting System Based on Change-Point Detection and Artificial Neural Networks: Application to Financial Economics

HICSS '01 Proceedings of the 34th Annual Hawaii International Conference on System Sciences ( HICSS-34)-Volume 3 - Volume 3
Cash Flow Forecasting Using Supervised and Unsupervised Neural Networks

IJCNN '00 Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00)-Volume 6 - Volume 6
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A Scalable Collaborative Filtering Framework Based on Co-Clustering

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Clustering with Bregman Divergences

The Journal of Machine Learning Research
Predictive discrete latent factor models for large scale dyadic data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation

The Journal of Machine Learning Research
Adaptive mixtures of local experts

Neural Computation
Regression-based latent factor models

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining for the most certain predictions from dyadic data

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Refinement of clustering solutions using a multi-label voting algorithm for neuro-fuzzy ensembles

ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part III
Time series forecasting with a hybrid clustering scheme and pattern recognition

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Learning multiple models for exploiting predictive heterogeneity in recommender systems

Proceedings of the 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems
Hierarchical co-clustering: off-line and incremental approaches

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

For difficult classification or regression problems, practitioners often segment the data into relatively homogeneous groups and then build a predictive model for each group. This two-step procedure usually results in simpler, more interpretable and actionable models without any loss in accuracy. In this work, we consider problems such as predicting customer behavior across products, where the independent variables can be naturally partitioned into two sets, that is, the data is dyadic in nature. A pivoting operation now results in the dependent variable showing up as entries in a “customer by product” data matrix. We present the Simultaneous CO-clustering And Learning (SCOAL) framework, based on the key idea of interleaving co-clustering and construction of prediction models to iteratively improve both cluster assignment and fit of the models. This algorithm provably converges to a local minimum of a suitable cost function. The framework not only generalizes co-clustering and collaborative filtering to model-based co-clustering, but can also be viewed as simultaneous co-segmentation and classification or regression, which is typically better than independently clustering the data first and then building models. Moreover, it applies to a wide range of bi-modal or multimodal data, and can be easily specialized to address classification and regression problems. We demonstrate the effectiveness of our approach on both these problems through experimentation on a variety of datasets.