Mining fault-tolerant item sets using subset size occurrence distributions

  • Authors:
  • Christian Borgelt;Tobias Kötter

  • Affiliations:
  • European Centre for Soft Computing, Mieres, Asturias, Spain;Dept. of Computer Science, University of Konstanz, Konstanz, Germany

  • Venue:
  • IDA'11 Proceedings of the 10th international conference on Advances in intelligent data analysis X
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mining fault-tolerant (or approximate or fuzzy) item sets means to allow for errors in the underlying transaction data in the sense that actually present items may not be recorded due to noise or measurement errors. In order to cope with such missing items, transactions that do not contain all items of a given set are still allowed to support it. However, either the number of missing items must be limited, or the transaction's contribution to the item set's support is reduced in proportion to the number of missing items, or both. In this paper we present an algorithm that efficiently computes the subset size occurrence distribution of item sets, evaluates this distribution to find fault-tolerant item sets, and exploits intermediate data to remove pseudo (or spurious) item sets. We demonstrate the usefulness of our algorithm by applying it to a concept detection task on the 2008/2009 Wikipedia Selection for schools.