Another look at automatic text-retrieval systems
Communications of the ACM
Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Journal of Research on Computing in Education
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The role graph model and conflict of interest
ACM Transactions on Information and System Security (TISSEC) - Special issue on role-based access control
Context-sensitive learning methods for text categorization
ACM Transactions on Information Systems (TOIS)
A graph-based system for network-vulnerability analysis
Proceedings of the 1998 workshop on New security paradigms
A general language model for information retrieval
Proceedings of the eighth international conference on Information and knowledge management
An Algorithm for Subgraph Isomorphism
Journal of the ACM (JACM)
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A graph-based formalism for RBAC
ACM Transactions on Information and System Security (TISSEC)
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Scalable, graph-based network vulnerability analysis
Proceedings of the 9th ACM conference on Computer and communications security
STATL: an attack language for state-based intrusion detection
Journal of Computer Security
Structural Matching in Computer Vision Using Probabilistic Relaxation
IEEE Transactions on Pattern Analysis and Machine Intelligence
A term weighting model based on utility theory
SIGIR '80 Proceedings of the 3rd annual ACM conference on Research and development in information retrieval
Methods for identifying versioned and plagiarized documents
Journal of the American Society for Information Science and Technology
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Content-triggered trust negotiation
ACM Transactions on Information and System Security (TISSEC)
Graph-theoretic techniques for web content mining
Graph-theoretic techniques for web content mining
Finding similar files in a large file system
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Mining specifications of malicious behavior
Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
AIC'07 Proceedings of the 7th Conference on 7th WSEAS International Conference on Applied Informatics and Communications - Volume 7
A content-driven access control system
Proceedings of the 7th symposium on Identity and trust on the Internet
The hybrid representation model for web document classification
International Journal of Intelligent Systems
A Graph Based Approach Toward Network Forensics Analysis
ACM Transactions on Information and System Security (TISSEC)
Journal of Computing Sciences in Colleges
Approximate graph edit distance computation by means of bipartite graph matching
Image and Vision Computing
LexRank: graph-based lexical centrality as salience in text summarization
Journal of Artificial Intelligence Research
A document-sensitive graph model for multi-document summarization
Knowledge and Information Systems
Text classification using graph mining-based feature extraction
Knowledge-Based Systems
Detecting data misuse by applying context-based data linkage
Proceedings of the 2010 ACM workshop on Insider threats
M-score: estimating the potential damage of data leakage incident by assigning misuseability weight
Proceedings of the 2010 ACM workshop on Insider threats
Detecting and characterizing social spam campaigns
IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Proceedings of the 42nd ACM technical symposium on Computer science education
Graph Regularized Nonnegative Matrix Factorization for Data Representation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Collective entity linking in web text: a graph-based method
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Which is the best multiclass SVM method? an empirical study
MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
A Survey of Data Leakage Detection and Prevention Solutions
A Survey of Data Leakage Detection and Prevention Solutions
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Clustering by analytic functions
Information Sciences: an International Journal
Effectiveness of template detection on noise reduction and websites summarization
Information Sciences: an International Journal
Efficient stochastic algorithms for document clustering
Information Sciences: an International Journal
Optimal clustering in the context of overlapping cluster analysis
Information Sciences: an International Journal
Toward a more practical unsupervised anomaly detection system
Information Sciences: an International Journal
Hi-index | 0.07 |
A new context-based model (CoBAn) for accidental and intentional data leakage prevention (DLP) is proposed. Existing methods attempt to prevent data leakage by either looking for specific keywords and phrases or by using various statistical methods. Keyword-based methods are not sufficiently accurate since they ignore the context of the keyword, while statistical methods ignore the content of the analyzed text. The context-based approach we propose leverages the advantages of both these approaches. The new model consists of two phases: training and detection. During the training phase, clusters of documents are generated and a graph representation of the confidential content of each cluster is created. This representation consists of key terms and the context in which they need to appear in order to be considered confidential. During the detection phase, each tested document is assigned to several clusters and its contents are then matched to each cluster's respective graph in an attempt to determine the confidentiality of the document. Extensive experiments have shown that the model is superior to other methods in detecting leakage attempts, where the confidential information is rephrased or is different from the original examples provided in the learning set.