Fast discovery of unexpected patterns in data, relative to a Bayesian network

Authors:
Szymon Jaroszewicz;Tobias Scheffer
Affiliations:
Technical University of Szczecin, Szczecin, Poland;Humboldt-Universität zu Berlin, Berlin, Germany
Venue:
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Year:
2005

Citing 12
Cited 14

PALO: a probabilistic hill-climbing algorithm

Artificial Intelligence
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
Fast discovery of association rules

Advances in knowledge discovery and data mining
Mining the most interesting rules

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Unexpectedness as a measure of interestingness in knowledge discovery

Decision Support Systems - Special issue on WITS '97
Small is beautiful: discovering the minimal set of unexpected patterns

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Bayesian Networks and Decision Graphs

Bayesian Networks and Decision Graphs
Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms

Data Mining and Knowledge Discovery
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Mining complex models from arbitrarily large databases in constant time

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding the most interesting patterns in a database quickly by using sequential sampling

The Journal of Machine Learning Research
Interestingness of frequent itemsets using Bayesian networks as background knowledge

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

Mining rank-correlated sets of numerical attributes

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering interesting patterns through user's interactive feedback

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Schema matching on streams with accuracy guarantees

Intelligent Data Analysis - Knowledge Discovery from Data Streams
Scalable pattern mining with Bayesian networks as background knowledge

Data Mining and Knowledge Discovery
Interestingness filtering engine: Mining Bayesian networks for interesting patterns

Expert Systems with Applications: An International Journal
An efficient rigorous approach for identifying statistically significant frequent itemsets

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A comprehensive survey of numeric and symbolic outlier mining techniques

Intelligent Data Analysis
Extraction of unexpected sentences: A sentiment classification assessed approach

Intelligent Data Analysis
Using interesting sequences to interactively build Hidden Markov Models

Data Mining and Knowledge Discovery
Using background knowledge to rank itemsets

Data Mining and Knowledge Discovery
WebUser: mining unexpected web usage

International Journal of Business Intelligence and Data Mining
Iterative bayesian network implementation by using annotated association rules

EKAW'06 Proceedings of the 15th international conference on Managing Knowledge in a World of Networks
An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets

Journal of the ACM (JACM)
Knowledge discovery interestingness measures based on unexpectedness

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider a model in which background knowledge on a given domain of interest is available in terms of a Bayesian network, in addition to a large database. The mining problem is to discover unexpected patterns: our goal is to find the strongest discrepancies between network and database. This problem is intrinsically difficult because it requires inference in a Bayesian network and processing the entire, potentially very large, database. A sampling-based method that we introduce is efficient and yet provably finds the approximately most interesting unexpected patterns. We give a rigorous proof of the method's correctness. Experiments shed light on its efficiency and practicality for large-scale Bayesian networks and databases.