Comparing Datasets Using Frequent Itemsets: Dependency on the Mining Parameters

  • Authors:
  • Irene Ntoutsi;Yannis Theodoridis

  • Affiliations:
  • Department of Informatics, University of Piraeus, Greece;Department of Informatics, University of Piraeus, Greece

  • Venue:
  • SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Comparison between sets of frequent itemsets has been traditionally utilized for raw dataset comparison assuming that frequent itemsets inherit the information lying in the original raw datasets. In this work, we revisit this assumption and examine whether dissimilarity between sets of frequent itemsets could serve as a measure of dissimilarity between raw datasets. In particular, we investigate how the dissimilarity between two sets of frequent itemsets is affected by the minSupport threshold used for their generation and the adopted compactness level of the itemsets lattice, namely frequent itemsets, closed frequent itemsets or maximal frequent itemsets. Our analysis shows that utilizing frequent itemsets comparison for dataset comparison is not as straightforward as related work has argued, a result which is verified through an experimental study and opens issues for further research in the KDD field.