Maximum entropy based significance of itemsets

Authors:
Nikolaj Tatti
Affiliations:
Helsinki University of Technology, HIIT Basic Research Unit, Department of Computer Science, Helsinki, Finland
Venue:
Knowledge and Information Systems
Year:
2008

Citing 19
Cited 13

The computational complexity of probabilistic inference using Bayesian belief networks (research note)

Artificial Intelligence
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
On the effective implementation of the iterative proportional fitting procedure

Computational Statistics & Data Analysis - Special issue dedicated to Toma´sˇ Havra´nek
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Fast discovery of association rules

Advances in knowledge discovery and data mining
A new framework for itemset generation

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Using association rules for product assortment decisions: a case study

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
KDD-Cup 2000 organizers' report: peeling the onion

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Empirical bayes screening for multi-item associations

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Alternative Interest Measures for Mining Associations in Databases

IEEE Transactions on Knowledge and Data Engineering
Mining All Non-derivable Frequent Itemsets

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Pruning Redundant Association Rules Using Maximum Entropy Principle

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data

IEEE Transactions on Knowledge and Data Engineering
Computational complexity of queries based on itemsets

Information Processing Letters
Discovering significant rules

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Finding low-entropy sets and trees from binary data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Maximum Entropy Based Significance of Itemsets

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining

Approximating the number of frequent sets in dense data

Knowledge and Information Systems
A framework for mining interesting pattern sets

Proceedings of the ACM SIGKDD Workshop on Useful Patterns
Probably the best itemsets

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Using background knowledge to rank itemsets

Data Mining and Knowledge Discovery
Constructing classification features using minimal predictive patterns

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A concise representation of association rules using minimal predictive rules

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
A framework for mining interesting pattern sets

ACM SIGKDD Explorations Newsletter
An information theoretic framework for data mining

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Tell me what i need to know: succinctly summarizing data with itemsets

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Maximum entropy models and subjective interestingness: an application to tiles in binary databases

Data Mining and Knowledge Discovery
Ranking sequential patterns with respect to significance

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Summarizing data succinctly with the most informative itemsets

ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on the Best of SIGKDD 2011
Pattern-based solution risk model for strategic IT outsourcing

ICDM'13 Proceedings of the 13th international conference on Advances in Data Mining: applications and theoretical aspects

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of defining the significance of an itemset. We say that the itemset is significant if we are surprised by its frequency when compared to the frequencies of its sub-itemsets. In other words, we estimate the frequency of the itemset from the frequencies of its sub-itemsets and compute the deviation between the real value and the estimate. For the estimation we use Maximum Entropy and for measuring the deviation we use Kullback–Leibler divergence. A major advantage compared to the previous methods is that we are able to use richer models whereas the previous approaches only measure the deviation from the independence model. We show that our measure of significance goes to zero for derivable itemsets and that we can use the rank as a statistical test. Our empirical results demonstrate that for our real datasets the independence assumption is too strong but applying more flexible models leads to good results.