Maximally informative k-itemsets and their efficient discovery

Authors:
Arno J. Knobbe;Eric K. Y. Ho
Affiliations:
Kiminkii, Houten, The Netherlands & Utrecht University, Utrecht, The Netherlands;Kiminkii, Houten, The Netherlands
Venue:
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2006

Citing 15
Cited 15

Introduction to algorithms

Introduction to algorithms
Elements of information theory

Elements of information theory
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Independent component analysis, a new concept?

Signal Processing - Special issue on higher order statistics
Theories for mutagenicity: a study in first-order and feature-based induction

Artificial Intelligence - Special volume on empirical methods
Explora: a multipattern and multistrategy discovery assistant

Advances in knowledge discovery and data mining
Axiomatic Approach to Feature Subset Selection Based on Relevance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
An extended transformation approach to inductive logic programming

ACM Transactions on Computational Logic (TOCL) - Special issue devoted to Robert A. Kowalski
Propositionalization approaches to relational data mining

Relational Data Mining
The Power of Decision Tables

ECML '95 Proceedings of the 8th European Conference on Machine Learning
A Practical Approach to Feature Selection

ML '92 Proceedings of the Ninth International Workshop on Machine Learning
An Algorithm for Multi-relational Discovery of Subgroups

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Topics in 0--1 data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An introduction to variable and feature selection

The Journal of Machine Learning Research

Finding low-entropy sets and trees from binary data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining non-redundant high order correlations in binary data

Proceedings of the VLDB Endowment
Mining Entropy l-Diversity Patterns

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
An Improved Algorithm for Mining Non-Redundant Interacting Feature Subsets

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Efficient algorithms for mining constrained frequent patterns from uncertain data

Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data
Efficient algorithms for the mining of constrained frequent patterns from uncertain data

ACM SIGKDD Explorations Newsletter
Pattern selection problems in multivariate time-series using equation discovery

Proceedings of the ACM SIGKDD Workshop on Useful Patterns
Optimal constraint-based decision tree induction from itemset lattices

Data Mining and Knowledge Discovery
Guest Editorial: Global modeling using local patterns

Data Mining and Knowledge Discovery
Discovering highly informative feature sets from data streams

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Summarising data by clustering items

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Krimp: mining itemsets that compress

Data Mining and Knowledge Discovery
Pattern teams

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
LODE: A distance-based classifier built on ensembles of positive and negative observations

Pattern Recognition
Summarizing categorical data by clustering attributes

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a new approach to mining binary data. We treat each binary feature (item) as a means of distinguishing two sets of examples. Our interest is in selecting from the total set of items an itemset of specified size, such that the database is partitioned with as uniform a distribution over the parts as possible. To achieve this goal, we propose the use of joint entropy as a quality measure for itemsets, and refer to optimal itemsets of cardinality k as maximally informative k-itemsets. We claim that this approach maximises distinctive power, as well as minimises redundancy within the feature set. A number of algorithms is presented for computing optimal itemsets efficiently.