Detecting data misuse by applying context-based data linkage

  • Authors:
  • Ma'ayan Gafny;Asaf Shabtai;Lior Rokach;Yuval Elovici

  • Affiliations:
  • Ben Gurion University, Beer-Sheva, Israel;Ben Gurion University, Beer-Sheva, Israel;Ben Gurion University, Beer-Sheva, Israel;Ben Gurion University, Beer-Sheva, Israel

  • Venue:
  • Proceedings of the 2010 ACM workshop on Insider threats
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Detecting data leakage/misuse poses a great challenge for organizations. Whether caused by malicious intent or an inadvertent mistake, data leakage/misuse can diminish a company's brand, reduce shareholder value, and damage the company's goodwill and reputation. This challenge is intensified when trying to detect and/or prevent data leakage/misuse performed by an insider with legitimate permissions to access the organization's systems and its critical data. In this paper we propose a new approach for identifying suspicious insiders who can access data stored in a database via an application. In the proposed method suspicious access to sensitive data is detected by analyzing the result-sets sent to the user following a request that the user submitted. Result-sets are analyzed within the instantaneous context in which the request was submitted. From the analysis of the result-set and the context we derive a "level of anomality". If the derived level is above a predefined threshold, an alert can be sent to the security officer. The proposed method applies data-linkage techniques in order to link the contextual features and the result-sets. Machine learning algorithms are then employed for generating a behavioral model during a learning phase. The behavioral model encapsulates knowledge on the behavior of a user; i.e., the characteristics of the result-sets of legitimate or malicious requests. This behavioral model is used for identifying malicious requests based on their abnormality. An evaluation with sanitized data shows the usefulness of the proposed method in detecting data misuse.