Information leak detection in financial e-mails using mail pattern analysis under partial information

Authors:
Chetan Kalyan;Krithika Chandrasekaran
Affiliations:
Computer Science and Engineering, RNS Institute of Technology, Bangalore, India;Electronics and Communication Engineering, PES Institution of Technology, Bangalore, India
Venue:
AIC'07 Proceedings of the 7th Conference on 7th WSEAS International Conference on Applied Informatics and Communications - Volume 7
Year:
2007

Citing 6
Cited 2

MailCat: an intelligent assistant for organizing e-mail

Proceedings of the third annual conference on Autonomous Agents
Feature Engineering for Text Classification

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Incremental Learning in SwiftFile

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Quantifying a critical training set size for generalization and overfitting using teacher neural networks

Quantifying a critical training set size for generalization and overfitting using teacher neural networks
Behavior-based modeling and its application to Email analysis

ACM Transactions on Internet Technology (TOIT)
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

Analyzing group communication for preventing data leakage via email

CollSec'10 Proceedings of the 2010 international conference on Collaborative methods for security and privacy
CoBAn: A context based model for data leakage prevention

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the advent of e-mail, sensitive information leakage has become a daunting problem in today's world. Quite often, the mail volume from a company is huge, making manual monitoring impossible. Automatic screening mostly relies on the idea of content scanning, but sometimes the information is so sensitive that even scanning the mails by a third party may not be permitted. Detection under such restrictions becomes difficult. Also, mails originating from specific organizations are often restricted in their subject and content, suggesting that powerful generic techniques like content scanning may not be needed. We propose that selection of proper input variables relevant to the domain could help in such cases; a simple straightforward learning scheme can then detect information leak efficiently using only mail pattern analysis. We used our technique on real life mails from financial institutions. By choosing the input variables judiciously, we were able to learn the mail patterns quite well and detected violations efficiently. The preliminary results are encouraging with an accuracy close to 92%. This technique is now being implemented in a real life commercial tool.