Using background knowledge to rank itemsets

Authors:
Nikolaj Tatti;Michael Mampaey
Affiliations:
Universiteit Antwerpen, Antwerp, Belgium;Universiteit Antwerpen, Antwerp, Belgium
Venue:
Data Mining and Knowledge Discovery
Year:
2010

Citing 15
Cited 5

Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Fast discovery of association rules

Advances in knowledge discovery and data mining
A new framework for itemset generation

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Using association rules for product assortment decisions: a case study

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Theory of dependence values

ACM Transactions on Database Systems (TODS)
KDD-Cup 2000 organizers' report: peeling the onion

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Probabilistic Networks and Expert Systems

Probabilistic Networks and Expert Systems
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Interestingness of frequent itemsets using Bayesian networks as background knowledge

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast discovery of unexpected patterns in data, relative to a Bayesian network

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Nestedness and segmented nestedness

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Assessing data mining results via swap randomization

ACM Transactions on Knowledge Discovery from Data (TKDD)
Banded structure in binary matrices

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Maximum entropy based significance of itemsets

Knowledge and Information Systems
Tell me something I don't know: randomization strategies for iterative data mining

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Tell me what i need to know: succinctly summarizing data with itemsets

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Summarizing data succinctly with the most informative itemsets

ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on the Best of SIGKDD 2011
Knowledge discovery interestingness measures based on unexpectedness

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Formalizing complex prior information to quantify subjective interestingness of frequent pattern sets

IDA'12 Proceedings of the 11th international conference on Advances in Intelligent Data Analysis
Behavior-based clustering and analysis of interestingness measures for association rule mining

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Assessing the quality of discovered results is an important open problem in data mining. Such assessment is particularly vital when mining itemsets, since commonly many of the discovered patterns can be easily explained by background knowledge. The simplest approach to screen uninteresting patterns is to compare the observed frequency against the independence model. Since the parameters for the independence model are the column margins, we can view such screening as a way of using the column margins as background knowledge. In this paper we study techniques for more flexible approaches for infusing background knowledge. Namely, we show that we can efficiently use additional knowledge such as row margins, lazarus counts, and bounds of ones. We demonstrate that these statistics describe forms of data that occur in practice and have been studied in data mining. To infuse the information efficiently we use a maximum entropy approach. In its general setting, solving a maximum entropy model is infeasible, but we demonstrate that for our setting it can be solved in polynomial time. Experiments show that more sophisticated models fit the data better and that using more information improves the frequency prediction of itemsets.