An introduction to Kolmogorov complexity and its applications
An introduction to Kolmogorov complexity and its applications
Towards parameter-free data mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics)
Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics)
On efficiently summarizing categorical databases
Knowledge and Information Systems
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
What is the Dimension of Your Binary Data?
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
The role mining problem: finding a minimal descriptive set of roles
Proceedings of the 12th ACM symposium on Access control models and technologies
On data mining, compression, and Kolmogorov complexity
Data Mining and Knowledge Discovery
Summarization – compressing data into an informative representation
Knowledge and Information Systems
Banded structure in binary matrices
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
IEEE Transactions on Knowledge and Data Engineering
Filling in the Blanks - Krimp Minimisation for Missing Data
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Finding Good Itemsets by Packing Data
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Bayesian Non-negative Matrix Factorization
ICA '09 Proceedings of the 8th International Conference on Independent Component Analysis and Signal Separation
Optimal Boolean Matrix Decomposition: Application to Role Engineering
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Multi-assignment clustering for Boolean data
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Automatic dimensionality selection from the scree plot via the use of profile likelihood
Computational Statistics & Data Analysis
Discovery of optimal factors in binary data via a novel method of matrix decomposition
Journal of Computer and System Sciences
Sparse Boolean Matrix Factorizations
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Krimp: mining itemsets that compress
Data Mining and Knowledge Discovery
IEEE Transactions on Information Theory
The minimum transfer cost principle for model-order selection
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Discovering relations using matrix factorization methods
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Basic level in formal concept analysis: interesting concepts and psychological ramifications
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
Matrix factorizations---where a given data matrix is approximated by a product of two or more factor matrices---are powerful data mining tools. Among other tasks, matrix factorizations are often used to separate global structure from noise. This, however, requires solving the `model order selection problem' of determining where fine-grained structure stops, and noise starts, i.e., what is the proper size of the factor matrices. Boolean matrix factorization (BMF)---where data, factors, and matrix product are Boolean---has received increased attention from the data mining community in recent years. The technique has desirable properties, such as high interpretability and natural sparsity. But so far no method for selecting the correct model order for BMF has been available. In this paper we propose to use the Minimum Description Length (MDL) principle for this task. Besides solving the problem, this well-founded approach has numerous benefits, e.g., it is automatic, does not require a likelihood function, is fast, and, as experiments show, is highly accurate. We formulate the description length function for BMF in general---making it applicable for any BMF algorithm. We extend an existing algorithm for BMF to use MDL to identify the best Boolean matrix factorization, analyze the complexity of the problem, and perform an extensive experimental evaluation to study its behavior.