Worst Case and a Distribution-Based Case Analyses of Sampling for Rule Discovery Based on Generality and Accuracy

Authors:
Einoshin Suzuki
Affiliations:
Division of Electrical and Computer Engineering, Faculty of Engineering, Yokohama National University, 79-5, Tokiwadai, Hodogaya, Yokohama 240-8501, Japan. suzuki@ynu.ac.jp
Venue:
Applied Intelligence
Year:
2005

Citing 9
Cited 1

Overfitting Avoidance as Bias

Machine Learning
Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
An introduction to computational learning theory

An introduction to computational learning theory
Approximate inference of functional dependencies from relations

ICDT '92 Selected papers of the fourth international conference on Database theory
Fast discovery of association rules

Advances in knowledge discovery and data mining
Multiple Comparisons in Induction Algorithms

Machine Learning
An Information Theoretic Approach to Rule Induction from Databases

IEEE Transactions on Knowledge and Data Engineering
Worst-Case Analysis of Rule Discovery

DS '01 Proceedings of the 4th International Conference on Discovery Science
Oversearching and layered search in empirical learning

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Sequential multi-criteria feature selection algorithm based on agent genetic algorithm

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose two sampling theories of rule discovery based on generality and accuracy. The first theory concerns the worst case: it extends a preliminary version of PAC learning, which represents a worst-case analysis for classification. In our analysis, a rule is defined as a probabilistic constraint of true assignment to the class attribute for corresponding examples, and we mainly analyze the case in which we try to avoid finding a bad rule. Effectiveness of our approach is demonstrated through examples for conjunction-rule discovery. The second theory concerns a distribution-based case: it represents the conditions that a rule exceeds pre-specified thresholds for generality and accuracy with high reliability. The idea is to assume a 2-dimensional normal distribution for two probabilistic variables, and obtain the conditions based on their confidence region. This approach has been validated experimentally using 21 benchmark data sets in the machine learning community against conventional methods each of which evaluates the reliability of generality. Discussions on related work are provided for PAC learning, multiple comparison, and analysis of association-rule discovery.