Communications of the ACM
Elements of information theory
Elements of information theory
An Experimental and Theoretical Comparison of Model SelectionMethods
Machine Learning - Special issue on the eighth annual conference on computational learning theory, (COLT '95)
A PAC analysis of a Bayesian estimator
COLT '97 Proceedings of the tenth annual conference on Computational learning theory
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Document clustering using word clusters via the information bottleneck method
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised document classification using sequential information maximization
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Model Selection and Error Estimation
Machine Learning
Biclustering of Expression Data
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Generalization Bounds for Decision Trees
COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Rademacher and Gaussian Complexities: Risk Bounds and Structural Results
COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Estimation of entropy and mutual information
Neural Computation
Pac-bayesian generalisation error bounds for gaussian process classification
The Journal of Machine Learning Research
Information-theoretic co-clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluating collaborative filtering recommender systems
ACM Transactions on Information Systems (TOIS)
Word clustering and disambiguation based on co-occurrence data
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Stability-based validation of clustering solutions
Neural Computation
Biclustering Algorithms for Biological Data Analysis: A Survey
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Tutorial on Practical Prediction Theory for Classification
The Journal of Machine Learning Research
Learning with matrix factorizations
Learning with matrix factorizations
A Scalable Collaborative Filtering Framework Based on Co-Clustering
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
ICML '06 Proceedings of the 23rd international conference on Machine learning
Orthogonal nonnegative matrix t-factorizations for clustering
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Multivariate information bottleneck
Neural Computation
The Minimum Description Length Principle (Adaptive Computation and Machine Learning)
The Minimum Description Length Principle (Adaptive Computation and Machine Learning)
Combining PAC-Bayesian and Generic Chaining Bounds
The Journal of Machine Learning Research
The Journal of Machine Learning Research
A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation
The Journal of Machine Learning Research
Bayesian probabilistic matrix factorization using Markov chain Monte Carlo
Proceedings of the 25th international conference on Machine learning
Multi-classification by categorical features via clustering
Proceedings of the 25th international conference on Machine learning
Generalization from Observed to Unobserved Features by Clustering
The Journal of Machine Learning Research
Coclustering of Human Cancer Microarrays Using Minimum Sum-Squared Residue Coclustering
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
PAC-Bayesian learning of linear classifiers
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Probabilistic matrix tri-factorization
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Towards full automation of lexicon construction
CLS '04 Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics
Latent Dirichlet Bayesian Co-Clustering
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Explicit learning curves for transduction and application to clustering and compression algorithms
Journal of Artificial Intelligence Research
Weighted Nonnegative Matrix Co-Tri-Factorization for Collaborative Prediction
ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
COLT'07 Proceedings of the 20th annual conference on Learning theory
Distribution-dependent PAC-bayes priors
ALT'10 Proceedings of the 21st international conference on Algorithmic learning theory
A PAC-bayes bound for tailored density estimation
ALT'10 Proceedings of the 21st international conference on Algorithmic learning theory
Laplace's law of succession and universal encoding
IEEE Transactions on Information Theory
Structural risk minimization over data-dependent hierarchies
IEEE Transactions on Information Theory
Rademacher penalties and structural risk minimization
IEEE Transactions on Information Theory
Hi-index | 0.00 |
We derive PAC-Bayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as co-clustering, matrix tri-factorization, graphical models, graph clustering, and pairwise clustering. We begin with the analysis of co-clustering, which is a widely used approach to the analysis of data matrices. We distinguish among two tasks in matrix data analysis: discriminative prediction of the missing entries in data matrices and estimation of the joint probability distribution of row and column variables in co-occurrence matrices. We derive PAC-Bayesian generalization bounds for the expected out-of-sample performance of co-clustering-based solutions for these two tasks. The analysis yields regularization terms that were absent in the previous formulations of co-clustering. The bounds suggest that the expected performance of co-clustering is governed by a trade-off between its empirical performance and the mutual information preserved by the cluster variables on row and column IDs. We derive an iterative projection algorithm for finding a local optimum of this trade-off for discriminative prediction tasks. This algorithm achieved state-of-the-art performance in the MovieLens collaborative filtering task. Our co-clustering model can also be seen as matrix tri-factorization and the results provide generalization bounds, regularization terms, and new algorithms for this form of matrix factorization. The analysis of co-clustering is extended to tree-shaped graphical models, which can be used to analyze high dimensional tensors. According to the bounds, the generalization abilities of tree-shaped graphical models depend on a trade-off between their empirical data fit and the mutual information that is propagated up the tree levels. We also formulate weighted graph clustering as a prediction problem: given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. The analysis of co-clustering easily extends to this problem and suggests that graph clustering should optimize the trade-off between empirical data fit and the mutual information that clusters preserve on graph nodes.