Maximum entropy models and subjective interestingness: an application to tiles in binary databases

Authors:
Tijl De Bie
Affiliations:
Intelligent Systems Laboratory, University of Bristol, Bristol, UK
Venue:
Data Mining and Knowledge Discovery
Year:
2011

Citing 20
Cited 13

Elements of information theory

Elements of information theory
Using association rules for product assortment decisions: a case study

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
The budgeted maximum coverage problem

Information Processing Letters
Small is beautiful: discovering the minimal set of unexpected patterns

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Introduction to the Conjugate Gradient Method Without the Agonizing Pain

An Introduction to the Conjugate Gradient Method Without the Agonizing Pain
Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data

IEEE Transactions on Knowledge and Data Engineering
Mining dependence rules by finding largest itemset support quota

Proceedings of the 2004 ACM symposium on Applied computing
Convex Optimization

Convex Optimization
Interestingness of frequent itemsets using Bayesian networks as background knowledge

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Geometric and combinatorial tiles in 0-1 data

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Interestingness measures for data mining: A survey

ACM Computing Surveys (CSUR)
Assessing data mining results via swap randomization

ACM Transactions on Knowledge Discovery from Data (TKDD)
Itemset frequency satisfiability: Complexity and axiomatization

Theoretical Computer Science
MINI: Mining Informative Non-redundant Itemsets

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Randomization Techniques for Data Mining Methods

ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
The Discrete Basis Problem

IEEE Transactions on Knowledge and Data Engineering
Maximum entropy based significance of itemsets

Knowledge and Information Systems
Graphical Models, Exponential Families, and Variational Inference

Foundations and Trends® in Machine Learning
Tell me something I don't know: randomization strategies for iterative data mining

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

A framework for mining interesting pattern sets

ACM SIGKDD Explorations Newsletter
An information theoretic framework for data mining

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
MIME: a framework for interactive visual pattern mining

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Comparing apples and oranges: measuring differences between data mining results

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Summarizing data succinctly with the most informative itemsets

ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on the Best of SIGKDD 2011
Knowledge discovery interestingness measures based on unexpectedness

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Interactive pattern mining on hidden data: a sampling-based solution

Proceedings of the 21st ACM international conference on Information and knowledge management
Discovering descriptive tile trees: by mining optimal geometric subtiles

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Summarizing categorical data by clustering attributes

Data Mining and Knowledge Discovery
Confirmation measures of association rule interestingness

Knowledge-Based Systems
One click mining: interactive local pattern discovery through implicit preference and performance learning

Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics
A statistical significance testing approach to mining the most informative set of patterns

Data Mining and Knowledge Discovery
Interesting pattern mining in multi-relational data

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent research has highlighted the practical benefits of subjective interestingness measures, which quantify the novelty or unexpectedness of a pattern when contrasted with any prior information of the data miner (Silberschatz and Tuzhilin, Proceedings of the 1st ACM SIGKDD international conference on Knowledge discovery and data mining (KDD95), 1995; Geng and Hamilton, ACM Comput Surv 38(3):9, 2006). A key challenge here is the formalization of this prior information in a way that lends itself to the definition of a subjective interestingness measure that is both meaningful and practical. In this paper, we outline a general strategy of how this could be achieved, before working out the details for a use case that is important in its own right. Our general strategy is based on considering prior information as constraints on a probabilistic model representing the uncertainty about the data. More specifically, we represent the prior information by the maximum entropy (MaxEnt) distribution subject to these constraints. We briefly outline various measures that could subsequently be used to contrast patterns with this MaxEnt model, thus quantifying their subjective interestingness. We demonstrate this strategy for rectangular databases with knowledge of the row and column sums. This situation has been considered before using computation intensive approaches based on swap randomizations, allowing for the computation of empirical p-values as interestingness measures (Gionis et al., ACM Trans Knowl Discov Data 1(3):14, 2007). We show how the MaxEnt model can be computed remarkably efficiently in this situation, and how it can be used for the same purpose as swap randomizations but computationally more efficiently. More importantly, being an explicitly represented distribution, the MaxEnt model can additionally be used to define analytically computable interestingness measures, as we demonstrate for tiles (Geerts et al., Proceedings of the 7th international conference on Discovery science (DS04), 2004) in binary databases.