Mining discrete patterns via binary matrix factorization

Authors:
Bao-Hong Shen;Shuiwang Ji;Jieping Ye
Affiliations:
Arizona State University, Tempe, AZ, USA;Arizona State University, Tempe, AZ, USA;Arizona State University, Tempe, AZ, USA
Venue:
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2009

Citing 10
Cited 4

A new approach to the maximum-flow problem

Journal of the ACM (JACM)
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
A semidiscrete matrix decomposition for latent semantic indexing information retrieval

ACM Transactions on Information Systems (TOIS)
PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time

Journal of the ACM (JACM)
An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision

IEEE Transactions on Pattern Analysis and Machine Intelligence
Compression, Clustering, and Pattern Discovery in Very High-Dimensional Discrete-Attribute Data Sets

IEEE Transactions on Knowledge and Data Engineering
A randomized polynomial-time simplex algorithm for linear programming

Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Nonorthogonal decomposition of binary matrices for bounded-error data compression and analysis

ACM Transactions on Mathematical Software (TOMS)
Eigenfaces for recognition

Journal of Cognitive Neuroscience

Automatic SIMD vectorization of fast fourier transforms for the larrabee and AVX instruction sets

Proceedings of the international conference on Supercomputing
A hierarchical model for ordinal matrix factorization

Statistics and Computing
Domination analysis of algorithms for bipartite boolean quadratic programs

FCT'13 Proceedings of the 19th international conference on Fundamentals of Computation Theory
An optimization framework for role mining

Journal of Computer Security

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining discrete patterns in binary data is important for subsampling, compression, and clustering. We consider rank-one binary matrix approximations that identify the dominant patterns of the data, while preserving its discrete property. A best approximation on such data has a minimum set of inconsistent entries, i.e., mismatches between the given binary data and the approximate matrix. Due to the hardness of the problem, previous accounts of such problems employ heuristics and the resulting approximation may be far away from the optimal one. In this paper, we show that the rank-one binary matrix approximation can be reformulated as a 0-1 integer linear program (ILP). However, the ILP formulation is computationally expensive even for small-size matrices. We propose a linear program (LP) relaxation, which is shown to achieve a guaranteed approximation error bound. We further extend the proposed formulations using the regularization technique, which is commonly employed to address overfitting. The LP formulation is restricted to medium-size matrices, due to the large number of variables involved for large matrices. Interestingly, we show that the proposed approximate formulation can be transformed into an instance of the minimum s-t cut problem, which can be solved efficiently by finding maximum flows. Our empirical study shows the efficiency of the proposed algorithm based on the maximum flow. Results also confirm the established theoretical bounds.