Efficient mining of emerging patterns: discovering trends and differences
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Towards parameter-free data mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining border descriptions of emerging patterns from dataset pairs
Knowledge and Information Systems
Reducing the Frequent Pattern Set
ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Compression picks item sets that matter
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Succinct summarization of transactional databases: an overlapped hyperrectangle scheme
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
StreamKrimp: Detecting Change in Data Streams
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Mining influential attributes that capture class and group contrast behaviour
Proceedings of the 17th ACM conference on Information and knowledge management
Data Mining and Knowledge Discovery
Compressing tags to find interesting media groups
Proceedings of the 18th ACM conference on Information and knowledge management
ACM SIGKDD Explorations Newsletter
Krimp: mining itemsets that compress
Data Mining and Knowledge Discovery
Summarizing transactional databases with overlapped hyperrectangles
Data Mining and Knowledge Discovery
Comparing apples and oranges: measuring differences between data mining results
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Pattern change discovery between high dimensional data sets
Proceedings of the 20th ACM international conference on Information and knowledge management
ACM SIGKDD Explorations Newsletter
Hi-index | 0.00 |
Characterising the differences between two databases is an often occurring problem in Data Mining. Detection of change over time is a prime example, comparing databases from two branches is another one. The key problem is to discover the patterns that describe the difference. Emerging patterns provide only a partial answer to this question. In previous work, we showed that the data distribution can be captured in a pattern-based model using compression [12]. Here, we extend this approach to define a generic dissimilarity measure on databases. Moreover, we show that this approach can identify those patterns that characterise the differences between two distributions. Experimental results show that our method provides a well-founded way to independently measure database dissimilarity that allows for thorough inspection of the actual differences. This illustrates the use of our approach in real world data mining.