Model order selection for boolean matrix factorization

Authors:
Pauli Miettinen;Jilles Vreeken
Affiliations:
Max Planck Institute for Informatics, Saarbrücken, Germany;Universiteit Antwerpen, Belgium, Antwerp, Belgium
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 22
Cited 3

An introduction to Kolmogorov complexity and its applications

An introduction to Kolmogorov complexity and its applications
Towards parameter-free data mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics)

Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics)
On efficiently summarizing categorical databases

Knowledge and Information Systems
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)

Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
What is the Dimension of Your Binary Data?

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
The role mining problem: finding a minimal descriptive set of roles

Proceedings of the 12th ACM symposium on Access control models and technologies
On data mining, compression, and Kolmogorov complexity

Data Mining and Knowledge Discovery
Summarization – compressing data into an informative representation

Knowledge and Information Systems
2008 Special Issue: An axiomatic approach to intrinsic dimension of a dataset

Neural Networks
Banded structure in binary matrices

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
The Discrete Basis Problem

IEEE Transactions on Knowledge and Data Engineering
Filling in the Blanks - Krimp Minimisation for Missing Data

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Finding Good Itemsets by Packing Data

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Bayesian Non-negative Matrix Factorization

ICA '09 Proceedings of the 8th International Conference on Independent Component Analysis and Signal Separation
Optimal Boolean Matrix Decomposition: Application to Role Engineering

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Multi-assignment clustering for Boolean data

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Automatic dimensionality selection from the scree plot via the use of profile likelihood

Computational Statistics & Data Analysis
Discovery of optimal factors in binary data via a novel method of matrix decomposition

Journal of Computer and System Sciences
Sparse Boolean Matrix Factorizations

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Krimp: mining itemsets that compress

Data Mining and Knowledge Discovery
Clustering by compression

IEEE Transactions on Information Theory

The minimum transfer cost principle for model-order selection

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Discovering relations using matrix factorization methods

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Basic level in formal concept analysis: interesting concepts and psychological ramifications

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Matrix factorizations---where a given data matrix is approximated by a product of two or more factor matrices---are powerful data mining tools. Among other tasks, matrix factorizations are often used to separate global structure from noise. This, however, requires solving the `model order selection problem' of determining where fine-grained structure stops, and noise starts, i.e., what is the proper size of the factor matrices. Boolean matrix factorization (BMF)---where data, factors, and matrix product are Boolean---has received increased attention from the data mining community in recent years. The technique has desirable properties, such as high interpretability and natural sparsity. But so far no method for selecting the correct model order for BMF has been available. In this paper we propose to use the Minimum Description Length (MDL) principle for this task. Besides solving the problem, this well-founded approach has numerous benefits, e.g., it is automatic, does not require a likelihood function, is fast, and, as experiments show, is highly accurate. We formulate the description length function for BMF in general---making it applicable for any BMF algorithm. We extend an existing algorithm for BMF to use MDL to identify the best Boolean matrix factorization, analyze the complexity of the problem, and perform an extensive experimental evaluation to study its behavior.