CoBAn: A context based model for data leakage prevention

Authors:
Gilad Katz;Yuval Elovici;Bracha Shapira
Affiliations:
-;-;-
Venue:
Information Sciences: an International Journal
Year:
2014

Citing 48
Cited 0

Another look at automatic text-retrieval systems

Communications of the ACM
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Identifying factors that influence performance of non-computing majors in the business computer information systems course

Journal of Research on Computing in Education
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The role graph model and conflict of interest

ACM Transactions on Information and System Security (TISSEC) - Special issue on role-based access control
Context-sensitive learning methods for text categorization

ACM Transactions on Information Systems (TOIS)
A graph-based system for network-vulnerability analysis

Proceedings of the 1998 workshop on New security paradigms
A general language model for information retrieval

Proceedings of the eighth international conference on Information and knowledge management
An Algorithm for Subgraph Isomorphism

Journal of the ACM (JACM)
An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A graph-based formalism for RBAC

ACM Transactions on Information and System Security (TISSEC)
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Scalable, graph-based network vulnerability analysis

Proceedings of the 9th ACM conference on Computer and communications security
STATL: an attack language for state-based intrusion detection

Journal of Computer Security
Structural Matching in Computer Vision Using Probabilistic Relaxation

IEEE Transactions on Pattern Analysis and Machine Intelligence
A term weighting model based on utility theory

SIGIR '80 Proceedings of the 3rd annual ACM conference on Research and development in information retrieval
Methods for identifying versioned and plagiarized documents

Journal of the American Society for Information Science and Technology
On the Resemblance and Containment of Documents

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Winnowing: local algorithms for document fingerprinting

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Content-triggered trust negotiation

ACM Transactions on Information and System Security (TISSEC)
Graph-theoretic techniques for web content mining

Graph-theoretic techniques for web content mining
Finding similar files in a large file system

WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Mining specifications of malicious behavior

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Information leak detection in financial e-mails using mail pattern analysis under partial information

AIC'07 Proceedings of the 7th Conference on 7th WSEAS International Conference on Applied Informatics and Communications - Volume 7
A content-driven access control system

Proceedings of the 7th symposium on Identity and trust on the Internet
The hybrid representation model for web document classification

International Journal of Intelligent Systems
A Graph Based Approach Toward Network Forensics Analysis

ACM Transactions on Information and System Security (TISSEC)
Predicting students' grades in computer science courses based on complexity measures of teacher's lecture notes

Journal of Computing Sciences in Colleges
Approximate graph edit distance computation by means of bipartite graph matching

Image and Vision Computing
LexRank: graph-based lexical centrality as salience in text summarization

Journal of Artificial Intelligence Research
A document-sensitive graph model for multi-document summarization

Knowledge and Information Systems
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
Text classification using graph mining-based feature extraction

Knowledge-Based Systems
Detecting data misuse by applying context-based data linkage

Proceedings of the 2010 ACM workshop on Insider threats
M-score: estimating the potential damage of data leakage incident by assigning misuseability weight

Proceedings of the 2010 ACM workshop on Insider threats
Detecting and characterizing social spam campaigns

IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Tutoring for retention

Proceedings of the 42nd ACM technical symposium on Computer science education
Graph Regularized Nonnegative Matrix Factorization for Data Representation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Collective entity linking in web text: a graph-based method

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Which is the best multiclass SVM method? an empirical study

MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
A Survey of Data Leakage Detection and Prevention Solutions

A Survey of Data Leakage Detection and Prevention Solutions
Support vector machines for spam categorization

IEEE Transactions on Neural Networks
Clustering by analytic functions

Information Sciences: an International Journal
Effectiveness of template detection on noise reduction and websites summarization

Information Sciences: an International Journal
Efficient stochastic algorithms for document clustering

Information Sciences: an International Journal
Optimal clustering in the context of overlapping cluster analysis

Information Sciences: an International Journal
Toward a more practical unsupervised anomaly detection system

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

A new context-based model (CoBAn) for accidental and intentional data leakage prevention (DLP) is proposed. Existing methods attempt to prevent data leakage by either looking for specific keywords and phrases or by using various statistical methods. Keyword-based methods are not sufficiently accurate since they ignore the context of the keyword, while statistical methods ignore the content of the analyzed text. The context-based approach we propose leverages the advantages of both these approaches. The new model consists of two phases: training and detection. During the training phase, clusters of documents are generated and a graph representation of the confidential content of each cluster is created. This representation consists of key terms and the context in which they need to appear in order to be considered confidential. During the detection phase, each tested document is assigned to several clusters and its contents are then matched to each cluster's respective graph in an attempt to determine the confidentiality of the document. Extensive experiments have shown that the model is superior to other methods in detecting leakage attempts, where the confidential information is rephrased or is different from the original examples provided in the learning set.