Secure top-k subgroup discovery

  • Authors:
  • Henrik Grosskreutz;Benedikt Lemmen;Stefan Rüping

  • Affiliations:
  • Fraunhofer IAIS, Schloss Birlinghoven, Sankt Augustin, Germany;Fraunhofer IAIS, Schloss Birlinghoven, Sankt Augustin, Germany;Fraunhofer IAIS, Schloss Birlinghoven, Sankt Augustin, Germany

  • Venue:
  • PSDML'10 Proceedings of the international ECML/PKDD conference on Privacy and security issues in data mining and machine learning
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Supervised descriptive rule discovery techniques like subgroup discovery are quite popular in applications like fraud detection or clinical studies. Compared with other descriptive techniques, like classical support/confidence association rules, subgroup discovery has the advantage that it comes up with only the top-k patterns, and that it makes use of a quality function that avoids patterns uncorrelated with the target. If these techniques are to be applied in privacy-sensitive scenarios involving distributed data, precise guarantees are needed regarding the amount of information leaked during the execution of the data mining. Unfortunately, the adaptation of secure multi-party protocols for classical support/confidence association rule mining to the task of subgroup discovery is impossible for fundamental reasons. The source is the different quality function and the restriction to a fixed number of patterns - i.e. exactly the desired features of subgroup discovery. In this paper, we present a new protocol which allows distributed subgroup discovery while avoiding the disclosure of the individual databases. We analyze the properties of the protocol, describe a prototypical implementation and present experiments that demonstrate the feasibility of the approach.