Item set mining based on cover similarity

  • Authors:
  • Marc Segond;Christian Borgelt

  • Affiliations:
  • European Centre for Soft Computing, Mieres, Asturias, Spain;European Centre for Soft Computing, Mieres, Asturias, Spain

  • Venue:
  • PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

While in standard frequent item set mining one tries to find item sets the support of which exceeds a user-specified threshold (minimum support) in a database of transactions, we strive to find item sets for which the similarity of their covers (that is, the sets of transactions containing them) exceeds a user-specified threshold. Starting from the generalized Jaccard index we extend our approach to a total of twelve specific similarity measures and a generalized form. We present an efficient mining algorithm that is inspired by the well-known Eclat algorithm and its improvements. By reporting experiments on several benchmark data sets we demonstrate that the runtime penalty incurred by the more complex (but also more informative) item set assessment is bearable and that the approach yields high quality and more useful item sets.