Privacy-aware regression modeling of participatory sensing data

  • Authors:
  • Hossein Ahmadi;Nam Pham;Raghu Ganti;Tarek Abdelzaher;Suman Nath;Jiawei Han

  • Affiliations:
  • University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;IBM T. J. Watson Research Center;University of Illinois at Urbana-Champaign;Networked Embedded Computing Group, Microsoft Research;University of Illinois at Urbana-Champaign

  • Venue:
  • Proceedings of the 8th ACM Conference on Embedded Networked Sensor Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many participatory sensing applications use data collected by participants to construct a public model of a system or phenomenon. For example, a health application might compute a model relating exercise and diet to amount of weight loss. While the ultimately computed model could be public, the individual input and output data traces used to construct it may be private data of participants (e.g., their individual food intake, lifestyle choices, and resulting weight). This paper proposes and experimentally studies a technique that attempts to keep such input and output data traces private, while allowing accurate model construction. This is significantly different from perturbation-based techniques in that no noise is added. The main contribution of the paper is to show a certain data transformation at the client side that helps keeping the client data private while not introducing any additional error to model construction. We particularly focus on linear regression models which are widely used in participatory sensing applications. We use the data set from a map-based participatory sensing service to evaluate our scheme. The service in question is a green navigation service that constructs regression models from participant data to predict the fuel consumption of vehicles on road segments. We evaluate our proposed mechanism by providing empirical evidence that: i) an individual data trace is generally hard to reconstruct with any reasonable accuracy, and ii) the regression model constructed using the transformed traces has a much smaller error than one based on additive data-perturbation schemes.