Private Data Discovery for Privacy Compliance in Collaborative Environments

  • Authors:
  • Larry Korba;Yunli Wang;Liqiang Geng;Ronggong Song;George Yee;Andrew S. Patrick;Scott Buffett;Hongyu Liu;Yonghua You

  • Affiliations:
  • Institute for Information Technology, National Research Council of Canada, Ottawa K1A 0R6;Institute for Information Technology, National Research Council of Canada, Ottawa K1A 0R6;Institute for Information Technology, National Research Council of Canada, Ottawa K1A 0R6;Institute for Information Technology, National Research Council of Canada, Ottawa K1A 0R6;Institute for Information Technology, National Research Council of Canada, Ottawa K1A 0R6;Institute for Information Technology, National Research Council of Canada, Ottawa K1A 0R6;Institute for Information Technology, National Research Council of Canada, Ottawa K1A 0R6;Institute for Information Technology, National Research Council of Canada, Ottawa K1A 0R6;Institute for Information Technology, National Research Council of Canada, Ottawa K1A 0R6

  • Venue:
  • CDVE '08 Proceedings of the 5th international conference on Cooperative Design, Visualization, and Engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the growing use of computers and the Internet, it has become difficult for organizations to locate and effectively manage sensitive personally identifiable information (PII). This problem becomes even more evident in collaborative computing environments. PII may be hidden anywhere within the file system of a computer. As well, in the course of different activities, via collaboration or not, personally identifiable information may migrate from computer to computer. This makes meeting the organizational privacy requirements all the more complex. Our particular interest is to develop technology that would automatically discover workflow across organizational collaborators that would include private data. Since in this context, it is important to understand where and when the private data is discovered, in this paper, we focus on PII discovery, i.e. automatically identifying private data existant in semi-structured and unstructured (free text) documents. The first part of the process involves identifying PII via named entity recognition. The second part determines relationships between those entities based upon a supervised machine learning method. We present test results of our methods using publicly-available data generated from different collaborative activities to provide an assessment of scalability in cooperative computing environment.