Characterising the difference

Authors:
Jilles Vreeken;Matthijs van Leeuwen;Arno Siebes
Affiliations:
Universiteit Utrecht;Universiteit Utrecht;Universiteit Utrecht
Venue:
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2007

Citing 7
Cited 11

Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Towards parameter-free data mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining border descriptions of emerging patterns from dataset pairs

Knowledge and Information Systems
Reducing the Frequent Pattern Set

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Compression picks item sets that matter

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
The similarity metric

IEEE Transactions on Information Theory
Clustering by compression

IEEE Transactions on Information Theory

Succinct summarization of transactional databases: an overlapped hyperrectangle scheme

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
StreamKrimp: Detecting Change in Data Streams

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Mining influential attributes that capture class and group contrast behaviour

Proceedings of the 17th ACM conference on Information and knowledge management
Identifying the components

Data Mining and Knowledge Discovery
Compressing tags to find interesting media groups

Proceedings of the 18th ACM conference on Information and knowledge management
Making pattern mining useful

ACM SIGKDD Explorations Newsletter
Krimp: mining itemsets that compress

Data Mining and Knowledge Discovery
Summarizing transactional databases with overlapped hyperrectangles

Data Mining and Knowledge Discovery
Comparing apples and oranges: measuring differences between data mining results

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Pattern change discovery between high dimensional data sets

Proceedings of the 20th ACM international conference on Information and knowledge management
Cross domain similarity mining: research issues and potential applications including supporting research by analogy

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Characterising the differences between two databases is an often occurring problem in Data Mining. Detection of change over time is a prime example, comparing databases from two branches is another one. The key problem is to discover the patterns that describe the difference. Emerging patterns provide only a partial answer to this question. In previous work, we showed that the data distribution can be captured in a pattern-based model using compression [12]. Here, we extend this approach to define a generic dissimilarity measure on databases. Moreover, we show that this approach can identify those patterns that characterise the differences between two distributions. Experimental results show that our method provides a well-founded way to independently measure database dissimilarity that allows for thorough inspection of the actual differences. This illustrates the use of our approach in real world data mining.