Filling in the Blanks - Krimp Minimisation for Missing Data

Authors:
Jilles Vreeken;Arno Siebes
Affiliations:
-;-
Venue:
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Year:
2008

Citing 0
Cited 8

Missing Values: Proposition of a Typology and Characterization with an Association Rule-Based Model

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Making pattern mining useful

ACM SIGKDD Explorations Newsletter
Krimp: mining itemsets that compress

Data Mining and Knowledge Discovery
Model order selection for boolean matrix factorization

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
The long and the short of it: summarising event sequences with serial episodes

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Smoothing categorical data

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Effects of data set features on the performances of classification algorithms

Expert Systems with Applications: An International Journal
Queries for data analysis

IDA'12 Proceedings of the 11th international conference on Advances in Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many data sets are incomplete. For correct analysis of such data, one can either use algorithms that are designed to handle missing data or use imputation. Imputation has the benefit that it allows for any type of data analysis. Obviously, this can only lead to proper conclusions if the provided data completion is both highly accurate and maintains all statistics of the original data. In this paper, we present three data completion methods that are built on the MDL-based {\sc Krimp} algorithm. Here, we also follow the MDL principle, i.e. the completed database that can be compressed best, is the best completion because it adheres best to the patterns in the data. By using local patterns, as opposed to a global model, Krimp captures the structure of the data in detail. Experiments show that both in terms of accuracy and expected differences of any marginal, better data reconstructions are provided than the state of the art, Structural EM.