Using Data Mining Methods to Predict Personally Identifiable Information in Emails

Authors:
Liqiang Geng;Larry Korba;Xin Wang;Yunli Wang;Hongyu Liu;Yonghua You
Affiliations:
Institute of Information Technology, National Research Council of Canada Fredericton, New Brunswick, Canada;Institute of Information Technology, National Research Council of Canada Fredericton, New Brunswick, Canada;Department of Geomatics Engineering, University of Calgary, Calgary, Canada;Institute of Information Technology, National Research Council of Canada Fredericton, New Brunswick, Canada;Institute of Information Technology, National Research Council of Canada Fredericton, New Brunswick, Canada;Institute of Information Technology, National Research Council of Canada Fredericton, New Brunswick, Canada
Venue:
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Year:
2008

Citing 8
Cited 0

Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Privacy preserving mining of association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic document metadata extraction using support vector machines

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Private data management in collaborative environments

CDVE'07 Proceedings of the 4th international conference on Cooperative design, visualization, and engineering
Privacy compliance enforcement in email

AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Private information management and compliance are important issues nowadays for most of organizations. As a major communication tool for organizations, email is one of the many potential sources for privacy leaks. Information extraction methods have been applied to detect private information in text files. However, since email messages usually consist of low quality text, information extraction methods for private information detection may not achieve good performance. In this paper, we address the problem of predicting the presence of private information in email using data mining and text mining methods. Two prediction models are proposed. The first model is based on association rules that predict one type of private information based on other types of private information identified in emails. The second model is based on classification models that predict private information according to the content of the emails. Experiments on the Enron email dataset show promising results.