On the Tractability of Rule Discovery from Distributed Data

Authors:
Martin Scholz
Affiliations:
University of Dortmund
Venue:
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Year:
2005

Citing 6
Cited 3

Explora: a multipattern and multistrategy discovery assistant

Advances in knowledge discovery and data mining
Boosting Algorithms for Parallel and Distributed Learning

Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Effect of Data Skewness in Parallel Mining of Association Rules

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining
Finding the most interesting patterns in a database quickly by using sequential sampling

The Journal of Machine Learning Research
ROC `n' Rule Learning—Towards a Better Understanding of Covering Algorithms

Machine Learning

Secure top-k subgroup discovery

PSDML'10 Proceedings of the international ECML/PKDD conference on Privacy and security issues in data mining and machine learning
Distributed subgroup mining

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Secure Distributed Subgroup Discovery in Horizontally Partitioned Data

Transactions on Data Privacy

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper analyses the tractability of rule selection for supervised learning in distributed scenarios. The selection of rules is usually guided by a utility measure such as predictive accuracy or weighted relative accuracy. A common strategy to tackle rule selection from distributed data is to evaluate rules locally on each dataset. While this works well for homogeneously distributed data, this work proves limitations of this strategy if distributions are allowed to deviate. The identification of those subsets for which local and global distributions deviate, poses a learning task of its own, which is shown to be at least as complex as discovering the globally best rules from local data.