Private Data Discovery for Privacy Compliance in Collaborative Environments

Authors:
Larry Korba;Yunli Wang;Liqiang Geng;Ronggong Song;George Yee;Andrew S. Patrick;Scott Buffett;Hongyu Liu;Yonghua You
Affiliations:
Institute for Information Technology, National Research Council of Canada, Ottawa K1A 0R6;Institute for Information Technology, National Research Council of Canada, Ottawa K1A 0R6;Institute for Information Technology, National Research Council of Canada, Ottawa K1A 0R6;Institute for Information Technology, National Research Council of Canada, Ottawa K1A 0R6;Institute for Information Technology, National Research Council of Canada, Ottawa K1A 0R6;Institute for Information Technology, National Research Council of Canada, Ottawa K1A 0R6;Institute for Information Technology, National Research Council of Canada, Ottawa K1A 0R6;Institute for Information Technology, National Research Council of Canada, Ottawa K1A 0R6;Institute for Information Technology, National Research Council of Canada, Ottawa K1A 0R6
Venue:
CDVE '08 Proceedings of the 5th international conference on Cooperative Design, Visualization, and Engineering
Year:
2008

Citing 7
Cited 1

Automatic document metadata extraction using support vector machines

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Predicting accuracy of extracting information from unstructured text collections

Proceedings of the 14th ACM international conference on Information and knowledge management
Adaptive information extraction

ACM Computing Surveys (CSUR)
A Survey of Web Information Extraction Systems

IEEE Transactions on Knowledge and Data Engineering
Scanning electronic documents for personally identifiable information

Proceedings of the 5th ACM workshop on Privacy in electronic society
Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Private data management in collaborative environments

CDVE'07 Proceedings of the 4th international conference on Cooperative design, visualization, and engineering

Privacy measures for free text documents: bridging the gap between theory and practice

TrustBus'11 Proceedings of the 8th international conference on Trust, privacy and security in digital business

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the growing use of computers and the Internet, it has become difficult for organizations to locate and effectively manage sensitive personally identifiable information (PII). This problem becomes even more evident in collaborative computing environments. PII may be hidden anywhere within the file system of a computer. As well, in the course of different activities, via collaboration or not, personally identifiable information may migrate from computer to computer. This makes meeting the organizational privacy requirements all the more complex. Our particular interest is to develop technology that would automatically discover workflow across organizational collaborators that would include private data. Since in this context, it is important to understand where and when the private data is discovered, in this paper, we focus on PII discovery, i.e. automatically identifying private data existant in semi-structured and unstructured (free text) documents. The first part of the process involves identifying PII via named entity recognition. The second part determines relationships between those entities based upon a supervised machine learning method. We present test results of our methods using publicly-available data generated from different collaborative activities to provide an assessment of scalability in cooperative computing environment.