Identifying a critical threat to privacy through automatic image classification

  • Authors:
  • David Lorenzi;Jaideep Vaidya

  • Affiliations:
  • Rutgers University, Newark, NJ, USA;Rutgers University, Newark, NJ, USA

  • Venue:
  • Proceedings of the first ACM conference on Data and application security and privacy
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Image classification, in general, is considered a hard problem, though it is necessary for many useful applications such as automatic target recognition. Indeed, no general methods exist that can work in varying scenarios and still achieve good performance across the board. In this paper, we actually identify a very interesting problem, where image classification is dangerously easy. We look at the problem of image classification, in the specific context of accurately classifying images containing highly sensitive data such as drivers licenses, credit cards and passports. Our key contribution is to build a Hierarchical Temporal Memory (HTM) network that is able to classify many sensitive images with over 90% accuracy, and use this to develop a system to automatically derive and transcribe sensitive information from image data. Our system classifies images into two groups -- sensitive and non-sensitive. The group of sensitive images can then be further analyzed. This is a real world security issue that could easily lead to privacy problems such as identity theft, since scans of passports and drivers licenses are routinely emailed or kept in digital form, and many local documents are left unencrypted. Essentially, an attacker can use data mining and machine learning techniques very effectively to breach individual privacy. Thus, our main contribution is to demonstrate the efficacy of image classification for deriving sensitive information, which could also serve as a guide for other interesting applications such as document detection and analysis. Thus, it also serves as a warning against leaving data unencrypted and again proves that security through obscurity is simply not enough.