Discovering descriptive tile trees: by mining optimal geometric subtiles

Authors:
Nikolaj Tatti;Jilles Vreeken
Affiliations:
Advanced Database Research and Modeling, Universiteit Antwerpen, Belgium;Advanced Database Research and Modeling, Universiteit Antwerpen, Belgium
Venue:
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Year:
2012

Citing 15
Cited 0

An introduction to Kolmogorov complexity and its applications

An introduction to Kolmogorov complexity and its applications
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Geometric and combinatorial tiles in 0-1 data

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)

Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
The Minimum Description Length Principle (Adaptive Computation and Machine Learning)

The Minimum Description Length Principle (Adaptive Computation and Machine Learning)
Frequent pattern mining: current status and future directions

Data Mining and Knowledge Discovery
Decomposable Families of Itemsets

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
The Chosen Few: On Identifying Valuable Patterns

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Mining Frequent Itemsets in a Stream

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
The Discrete Basis Problem

IEEE Transactions on Knowledge and Data Engineering
Tell me something I don't know: randomization strategies for iterative data mining

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Krimp: mining itemsets that compress

Data Mining and Knowledge Discovery
Tell me what i need to know: succinctly summarizing data with itemsets

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Maximum entropy models and subjective interestingness: an application to tiles in binary databases

Data Mining and Knowledge Discovery
A bi-clustering framework for categorical data

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

When analysing binary data, the ease at which one can interpret results is very important. Many existing methods, however, discover either models that are difficult to read, or return so many results interpretation becomes impossible. Here, we study a fully automated approach for mining easily interpretable models for binary data. We model data hierarchically with noisy tiles--rectangles with significantly different density than their parent tile. To identify good trees, we employ the Minimum Description Length principle. We propose Stijl, a greedy any-time algorithm for mining good tile trees from binary data. Iteratively, it finds the locally optimal addition to the current tree, allowing overlap with tiles of the same parent. A major result of this paper is that we find the optimal tile in only Θ(NM min(N,M)) time. Stijl can either be employed as a top-k miner, or by MDL we can identify the tree that describes the data best. Experiments show we find succinct models that accurately summarise the data, and, by their hierarchical property are easily interpretable.