An effective hash-based algorithm for mining association rules
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Algorithms for association rule mining — a general survey and comparison
ACM SIGKDD Explorations Newsletter
A Tight Upper Bound on the Number of Candidate Patterns
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Feasible itemset distributions in data mining: theory and application
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining frequent item sets by opportunistic projection
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A new two-phase sampling based algorithm for discovering association rules
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluation of sampling for data mining of association rules
RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Adaptive and Resource-Aware Mining of Frequent Sets
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Fast vertical mining using diffsets
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A Thorough Experimental Study of Datasets for Frequent Itemsets
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Cost-based query optimization for complex pattern mining on multiple databases
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
A new classification of datasets for frequent itemsets
Journal of Intelligent Information Systems
Information Sciences: an International Journal
International Journal of Distributed Systems and Technologies
Hi-index | 0.00 |
Most of the complexity of common data mining tasks is due to the unknown amount of information contained in the data being mined. The more patterns and corelations are contained in such data, the more resources are needed to extract them. This is confirmed by the fact that in general there is not a single best algorithm for a given data mining task on any possible kind of input dataset. Rather, in order to achieve good performances, strategies and optimizations have to be adopted according to the dataset specific characteristics. For example one typical distinction in transactional databases is between sparse and dense datasets. In this paper we consider Frequent Set Counting as a case study for data mining algorithms. We propose a statistical analysis of the properties of transactional datasets that allows for a characterization of the dataset complexity. We show how such characterization can be used in many fields, from performance prediction to optimization.