MailCat: an intelligent assistant for organizing e-mail
Proceedings of the third annual conference on Autonomous Agents
Feature Engineering for Text Classification
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Incremental Learning in SwiftFile
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Quantifying a critical training set size for generalization and overfitting using teacher neural networks
Behavior-based modeling and its application to Email analysis
ACM Transactions on Internet Technology (TOIT)
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Analyzing group communication for preventing data leakage via email
CollSec'10 Proceedings of the 2010 international conference on Collaborative methods for security and privacy
CoBAn: A context based model for data leakage prevention
Information Sciences: an International Journal
Hi-index | 0.00 |
With the advent of e-mail, sensitive information leakage has become a daunting problem in today's world. Quite often, the mail volume from a company is huge, making manual monitoring impossible. Automatic screening mostly relies on the idea of content scanning, but sometimes the information is so sensitive that even scanning the mails by a third party may not be permitted. Detection under such restrictions becomes difficult. Also, mails originating from specific organizations are often restricted in their subject and content, suggesting that powerful generic techniques like content scanning may not be needed. We propose that selection of proper input variables relevant to the domain could help in such cases; a simple straightforward learning scheme can then detect information leak efficiently using only mail pattern analysis. We used our technique on real life mails from financial institutions. By choosing the input variables judiciously, we were able to learn the mail patterns quite well and detected violations efficiently. The preliminary results are encouraging with an accuracy close to 92%. This technique is now being implemented in a real life commercial tool.