The minimum transfer cost principle for model-order selection

Authors:
Mario Frank;Morteza Haghir Chehreghani;Joachim M. Buhmann
Affiliations:
Department of Computer Science, ETH Zurich, Switzerland;Department of Computer Science, ETH Zurich, Switzerland;Department of Computer Science, ETH Zurich, Switzerland
Venue:
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Year:
2011

Citing 9
Cited 2

Role mining - revealing business roles for security administration using data mining technology

Proceedings of the eighth ACM symposium on Access control models and technologies
Stability-based validation of clustering solutions

Neural Computation
Correlation Clustering

Machine Learning
Aggregating inconsistent information: Ranking and clustering

Journal of the ACM (JACM)
Multi-assignment clustering for Boolean data

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
On the definition of role mining

Proceedings of the 15th ACM symposium on Access control models and technologies
Mining roles with noisy data

Proceedings of the 15th ACM symposium on Access control models and technologies
Model order selection for boolean matrix factorization

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries

IEEE Transactions on Image Processing

Multi-assignment clustering for boolean data

The Journal of Machine Learning Research
Role Mining with Probabilistic Models

ACM Transactions on Information and System Security (TISSEC)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of model-order selection is to select a model variant that generalizes best from training data to unseen test data. In unsupervised learning without any labels, the computation of the generalization error of a solution poses a conceptual problem which we address in this paper. We formulate the principle of "minimum transfer costs" for model-order selection. This principle renders the concept of cross-validation applicable to unsupervised learning problems. As a substitute for labels, we introduce a mapping between objects of the training set to objects of the test set enabling the transfer of training solutions. Our method is explained and investigated by applying it to well-known problems such as singular-value decomposition, correlation clustering, Gaussian mixturemodels, and k-means clustering. Our principle finds the optimal model complexity in controlled experiments and in real-world problems such as image denoising, role mining and detection of misconfigurations in access-control data.