Privacy preserving feature selection for distributed data using virtual dimension

  • Authors:
  • Madhushri Banerjee;Sumit Chakravarty

  • Affiliations:
  • Georgia Gwinnett College, Lawrenceville, GA, USA;Stinger & Ghaffarian Technologies Inc., Greenbelt, MD, USA

  • Venue:
  • Proceedings of the 20th ACM international conference on Information and knowledge management
  • Year:
  • 2011

Quantified Score

Hi-index 0.02

Visualization

Abstract

Data Mining often suffers from the curse of dimensionality. Huge numbers of dimensions or attributes in the data pose serious problems to the data mining tasks. Traditionally data dimensionality reduction techniques like Principal Component Analysis have been used to address this problem.However, the need might be to remain in the original attribute space and identify the key predictive attributes instead of moving to a transformed space. As a result feature subset selection has become an important area of research over the last few years. With the advent of network technologies data is sometimes distributed in multiple locations and often with multiple parties. The biggest concern while sharing data is data privacy. Here, in this paper a secure distributed protocol is proposed that will allow feature selection for multiple parties without revealing their own data. The proposed distributed feature selection method has evolved from a method called virtual dimension reduction used in the field of hyperspectral image processing for selection of subset of hyperspectral bands for further analysis. The experimental results with real life datasets presented in this paper will demonstrate the effectiveness of the proposed method.