A framework for measuring changes in data characteristics
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases
Proceedings of the 17th International Conference on Data Engineering
Clustering Distributed Homogeneous Datasets
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure
IEEE Transactions on Knowledge and Data Engineering
Mining compressed frequent-pattern sets
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Association-based similarity testing and its applications
Intelligent Data Analysis
Hi-index | 0.00 |
Comparison between sets of frequent itemsets has been traditionally utilized for raw dataset comparison assuming that frequent itemsets inherit the information lying in the original raw datasets. In this work, we revisit this assumption and examine whether dissimilarity between sets of frequent itemsets could serve as a measure of dissimilarity between raw datasets. In particular, we investigate how the dissimilarity between two sets of frequent itemsets is affected by the minSupport threshold used for their generation and the adopted compactness level of the itemsets lattice, namely frequent itemsets, closed frequent itemsets or maximal frequent itemsets. Our analysis shows that utilizing frequent itemsets comparison for dataset comparison is not as straightforward as related work has argued, a result which is verified through an experimental study and opens issues for further research in the KDD field.