A new classification of datasets for frequent itemsets

Authors:
Frédéric Flouvat;Fabien Marchi;Jean-Marc Petit
Affiliations:
University of New Caledonia, PPME, Nouméa, New Caledonia 98851;Université de Lyon, Université Lyon 1, LIRIS, UMR5205 CNRS, Lyon, France 69621;Université de Lyon, INSA-Lyon, LIRIS, UMR5205 CNRS, Lyon, France 69621
Venue:
Journal of Intelligent Information Systems
Year:
2010

Citing 17
Cited 1

A Proof Procedure for Data Dependencies

Journal of the ACM (JACM)
The design of relational databases

The design of relational databases
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A condensed representation to find frequent patterns

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining frequent patterns with counting inference

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Levelwise Search and Borders of Theories in KnowledgeDiscovery

Data Mining and Knowledge Discovery
Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries

Data Mining and Knowledge Discovery
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Concise Representation of Frequent Patterns Based on Generalized Disjunction-Free Generators

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Feasible itemset distributions in data mining: theory and application

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Zigzag: a new algorithm for mining large inclusion dependencies in databases

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Statistical properties of transactional databases

Proceedings of the 2004 ACM symposium on Applied computing
Distribution-Based Synthetic Database Generation Techniques for Itemset Mining

IDEAS '05 Proceedings of the 9th International Database Engineering & Application Symposium
Essential patterns: a perfect cover of frequent patterns

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery

Looking for a structural characterization of the sparseness measure of (frequent closed) itemset contexts

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The discovery of frequent patterns is a famous problem in data mining. While plenty of algorithms have been proposed during the last decade, only a few contributions have tried to understand the influence of datasets on the algorithms behavior. Being able to explain why certain algorithms are likely to perform very well or very poorly on some datasets is still an open question. In this setting, we describe a thorough experimental study of datasets with respect to frequent itemsets. We study the distribution of frequent itemsets with respect to itemsets size together with the distribution of three concise representations: frequent closed, frequent free and frequent essential itemsets. For each of them, we also study the distribution of their positive and negative borders whenever possible. The main outcome of these experiments is a new classification of datasets invariant w.r.t. minsup variations and robust to explain efficiency of several implementations.