Statistical properties of transactional databases

Authors:
P. Palmerini;S. Orlando;R. Perego
Affiliations:
ISTI-CNR, Pisa, Italy;Università Ca' Foscari, Venice, Italy;ISTI-CNR, Pisa, Italy
Venue:
Proceedings of the 2004 ACM symposium on Applied computing
Year:
2004

Citing 11
Cited 5

An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Algorithms for association rule mining — a general survey and comparison

ACM SIGKDD Explorations Newsletter
A Tight Upper Bound on the Number of Candidate Patterns

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Feasible itemset distributions in data mining: theory and application

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining frequent item sets by opportunistic projection

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A new two-phase sampling based algorithm for discovering association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluation of sampling for data mining of association rules

RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Adaptive and Resource-Aware Mining of Frequent Sets

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Fast vertical mining using diffsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

A Thorough Experimental Study of Datasets for Frequent Itemsets

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Cost-based query optimization for complex pattern mining on multiple databases

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
A new classification of datasets for frequent itemsets

Journal of Intelligent Information Systems
Looking for a structural characterization of the sparseness measure of (frequent closed) itemset contexts

Information Sciences: an International Journal
A Study on the Effect of Application and Resource Characteristics on the QoS in Service Provisioning Environments

International Journal of Distributed Systems and Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most of the complexity of common data mining tasks is due to the unknown amount of information contained in the data being mined. The more patterns and corelations are contained in such data, the more resources are needed to extract them. This is confirmed by the fact that in general there is not a single best algorithm for a given data mining task on any possible kind of input dataset. Rather, in order to achieve good performances, strategies and optimizations have to be adopted according to the dataset specific characteristics. For example one typical distinction in transactional databases is between sparse and dense datasets. In this paper we consider Frequent Set Counting as a case study for data mining algorithms. We propose a statistical analysis of the properties of transactional datasets that allows for a characterization of the dataset complexity. We show how such characterization can be used in many fields, from performance prediction to optimization.