K-optimal pattern discovery: an efficient and effective approach to exploratory data mining

Authors:
Geoffrey I. Webb
Affiliations:
Faculty of Information Technology, Monash University, Vic, Australia
Venue:
AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Year:
2005

Citing 14
Cited 0

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficient search for association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering associations with numeric variables

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Constraint-Based Rule Mining in Large, Dense Databases

Data Mining and Knowledge Discovery
An Algorithm for Multi-relational Discovery of Subgroups

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Mining Top.K Frequent Closed Patterns without Minimum Support

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Finding Interesting Associations without Support Pruning

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Finding the most interesting patterns in a database quickly by using sequential sampling

The Journal of Machine Learning Research
SEWeP: using site semantics and a taxonomy to enhance the Web personalization process

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
On detecting differences between groups

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Helping everyday users find anomalies in data feeds

Helping everyday users find anomalies in data feeds
K-Optimal Rule Discovery

Data Mining and Knowledge Discovery
A delivery framework for health data mining and analytics

ACSC '05 Proceedings of the Twenty-eighth Australasian conference on Computer Science - Volume 38
OPUS: an efficient admissible algorithm for unordered search

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most data-mining techniques seek a single model that optimizes an objective function with respect to the data. In many real-world applications several models will equally optimize this function. However, they may not all equally satisfy a user’s preferences, which will be affected by background knowledge and pragmatic considerations that are infeasible to quantify into an objective function. Thus, the program may make arbitrary and potentially suboptimal decisions. In contrast, methods for exploratory pattern discovery seek all models that satisfy user-defined criteria. This allows the user select between these models, rather than relinquishing control to the program. Association rule discovery [1] is the best known example of this approach. However, it is based on the minimum-support technique, by which patterns are only discovered that occur in the data more than a user-specified number of times. While this approach has proved very effective in many applications, it is subject to a number of limitations. It creates an arbitrary discontinuity in the interestingness function by which one more or less case supporting a pattern can transform its assessment from uninteresting to most interesting. Sometimes the most interesting patterns are very rare [3]. Minimum support may not be relevant to whether a pattern is interesting. It is often difficult to find a minimum support level that results in sufficient but not excessive numbers of patterns being discovered. It cannot handle dense data [2]. It limits the ability to efficiently prune the search space on the basis on constraints that are neither monotone nor anti-monotone with respect to support. K-optimal pattern discovery [4,5,11,14,15,17-20] is an exploratory technique that finds the k patterns that optimize a user-selected objective function while respecting other user-specified constraints. This strategy avoids the above problems while empowering the user to select between preference criteria and to directly control the number of patterns that are discovered. It also supports statistically sound exploratory pattern discovery [8]. Its effectiveness is demonstrated by a large range of applications [5-10,12,13].