An Index for the Data Size to Extract Decomposable Structures in LAD

  • Authors:
  • Hirotaka Ono;Mutsunori Yagiura;Toshihide Ibaraki

  • Affiliations:
  • -;-;-

  • Venue:
  • ISAAC '01 Proceedings of the 12th International Symposium on Algorithms and Computation
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Logical analysis of data (LAD) is one of the methodologies for extracting knowledge as a Boolean function f from a given pair of data sets (T,F) on attributes set S of size n, in which T (resp., F) ⊆ {0, 1}n denotes a set of positive (resp., negative) examples for the phenomenon under consideration. In this paper, we consider the case in which extracted knowledge has a decomposable structure; i.e., f is described as a form f(x) = g(x[S0], h(x[S1])) for some S0, S1 ⊆ S and Boolean functions g and h, where x[I] denotes the projection of vector x on I. In order to detect meaningful decomposable structures, it is expected that the sizes |T| and |F| must be sufficiently large. In this paper, we provide an index for such indispensable number of examples, based on probabilistic analysis. Using p = |T|/(|T| + |F|) and q = |F|/(|T| + |F|), we claim that there exist many deceptive decomposable structures of (T,F) if |T| + |F| ≤ √2n-1/pq. The computational results on synthetically generated data sets show that the above index gives a good lower bound on the indispensable data size.