Privacy-preserving Naïve Bayes classification

  • Authors:
  • Jaideep Vaidya;Murat Kantarcıoğlu;Chris Clifton

  • Affiliations:
  • Rutgers University, Newark, USA;University of Texas at Dallas, Dallas, USA;Purdue University, West Lafayette, USA

  • Venue:
  • The VLDB Journal — The International Journal on Very Large Data Bases
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

Privacy-preserving data mining--developing models without seeing the data --- is receiving growing attention. This paper assumes a privacy-preserving distributed data mining scenario: data sources collaborate to develop a global model, but must not disclose their data to others. The problem of secure distributed classification is an important one. In many situations, data is split between multiple organizations. These organizations may want to utilize all of the data to create more accurate predictive models while revealing neither their training data/databases nor the instances to be classified. Naïve Bayes is often used as a baseline classifier, consistently providing reasonable classification performance. This paper brings privacy-preservation to that baseline, presenting protocols to develop a Naïve Bayes classifier on both vertically as well as horizontally partitioned data.