Automatic document metadata extraction using support vector machines
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Predicting accuracy of extracting information from unstructured text collections
Proceedings of the 14th ACM international conference on Information and knowledge management
Adaptive information extraction
ACM Computing Surveys (CSUR)
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
Scanning electronic documents for personally identifiable information
Proceedings of the 5th ACM workshop on Privacy in electronic society
ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Private data management in collaborative environments
CDVE'07 Proceedings of the 4th international conference on Cooperative design, visualization, and engineering
Privacy measures for free text documents: bridging the gap between theory and practice
TrustBus'11 Proceedings of the 8th international conference on Trust, privacy and security in digital business
Hi-index | 0.00 |
With the growing use of computers and the Internet, it has become difficult for organizations to locate and effectively manage sensitive personally identifiable information (PII). This problem becomes even more evident in collaborative computing environments. PII may be hidden anywhere within the file system of a computer. As well, in the course of different activities, via collaboration or not, personally identifiable information may migrate from computer to computer. This makes meeting the organizational privacy requirements all the more complex. Our particular interest is to develop technology that would automatically discover workflow across organizational collaborators that would include private data. Since in this context, it is important to understand where and when the private data is discovered, in this paper, we focus on PII discovery, i.e. automatically identifying private data existant in semi-structured and unstructured (free text) documents. The first part of the process involves identifying PII via named entity recognition. The second part determines relationships between those entities based upon a supervised machine learning method. We present test results of our methods using publicly-available data generated from different collaborative activities to provide an assessment of scalability in cooperative computing environment.