Discovery of optimal factors in binary data via a novel method of matrix decomposition

  • Authors:
  • Radim Belohlavek;Vilem Vychodil

  • Affiliations:
  • Department of Systems Science and Industrial Engineering, Binghamton University---SUNY, Binghamton, NY 13902, USA and Department of Computer Science, Palacký University, Tomkova 40, CZ-779 00 ...;Department of Systems Science and Industrial Engineering, Binghamton University---SUNY, Binghamton, NY 13902, USA and Department of Computer Science, Palacký University, Tomkova 40, CZ-779 00 ...

  • Venue:
  • Journal of Computer and System Sciences
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a novel method of decomposition of an nxm binary matrix I into a Boolean product A@?B of an nxk binary matrix A and a kxm binary matrix B with k as small as possible. Attempts to solve this problem are known from Boolean factor analysis where I is interpreted as an object-attribute matrix, A and B are interpreted as object-factor and factor-attribute matrices, and the aim is to find a decomposition with a small number k of factors. The method presented here is based on a theorem proved in this paper. It says that optimal decompositions, i.e. those with the least number of factors possible, are those where factors are formal concepts in the sense of formal concept analysis. Finding an optimal decomposition is an NP-hard problem. However, we present an approximation algorithm for finding optimal decompositions which is based on the insight provided by the theorem. The algorithm avoids the need to compute all formal concepts and significantly outperforms a greedy approximation algorithm for a set covering problem to which the problem of matrix decomposition is easily shown to be reducible. We present results of several experiments with various data sets including those from CIA World Factbook and UCI Machine Learning Repository. In addition, we present further geometric insight including description of transformations between the space of attributes and the space of factors.