Secure top-k subgroup discovery

Authors:
Henrik Grosskreutz;Benedikt Lemmen;Stefan Rüping
Affiliations:
Fraunhofer IAIS, Schloss Birlinghoven, Sankt Augustin, Germany;Fraunhofer IAIS, Schloss Birlinghoven, Sankt Augustin, Germany;Fraunhofer IAIS, Schloss Birlinghoven, Sankt Augustin, Germany
Venue:
PSDML'10 Proceedings of the international ECML/PKDD conference on Privacy and security issues in data mining and machine learning
Year:
2010

Citing 21
Cited 0

Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Explora: a multipattern and multistrategy discovery assistant

Advances in knowledge discovery and data mining
A fast distributed algorithm for mining association rules

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Detecting Group Differences: Mining Contrast Sets

Data Mining and Knowledge Discovery
An Algorithm for Multi-relational Discovery of Subgroups

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Privacy Preserving Data Mining

CRYPTO '00 Proceedings of the 20th Annual International Cryptology Conference on Advances in Cryptology
Cryptographic techniques for privacy-preserving data mining

ACM SIGKDD Explorations Newsletter
Tools for privacy preserving distributed data mining

ACM SIGKDD Explorations Newsletter
Subgroup Discovery with CN2-SD

The Journal of Machine Learning Research
Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data

IEEE Transactions on Knowledge and Data Engineering
ROC `n' Rule Learning—Towards a Better Understanding of Covering Algorithms

Machine Learning
On the Tractability of Rule Discovery from Distributed Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Discovering significant rules

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy Preserving Nearest Neighbor Search

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Protocols for secure computations

SFCS '82 Proceedings of the 23rd Annual Symposium on Foundations of Computer Science
How to generate and exchange secrets

SFCS '86 Proceedings of the 27th Annual Symposium on Foundations of Computer Science
Tight Optimistic Estimates for Fast Subgroup Discovery

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Efficient Discovery of Statistically Significant Association Rules

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Correlated itemset mining in ROC space: a constraint programming approach

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining

The Journal of Machine Learning Research
Distributed subgroup mining

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Supervised descriptive rule discovery techniques like subgroup discovery are quite popular in applications like fraud detection or clinical studies. Compared with other descriptive techniques, like classical support/confidence association rules, subgroup discovery has the advantage that it comes up with only the top-k patterns, and that it makes use of a quality function that avoids patterns uncorrelated with the target. If these techniques are to be applied in privacy-sensitive scenarios involving distributed data, precise guarantees are needed regarding the amount of information leaked during the execution of the data mining. Unfortunately, the adaptation of secure multi-party protocols for classical support/confidence association rule mining to the task of subgroup discovery is impossible for fundamental reasons. The source is the different quality function and the restriction to a fixed number of patterns - i.e. exactly the desired features of subgroup discovery. In this paper, we present a new protocol which allows distributed subgroup discovery while avoiding the disclosure of the individual databases. We analyze the properties of the protocol, describe a prototypical implementation and present experiments that demonstrate the feasibility of the approach.