Exploiting domain knowledge to detect outliers

  • Authors:
  • Fabrizio Angiulli;Fabio Fassetti

  • Affiliations:
  • DIMES Department, University of Calabria, Rende, Italy;DIMES Department, University of Calabria, Rende, Italy

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a novel definition of outlier whose aim is to embed an available domain knowledge in the process of discovering outliers. Specifically, given a background knowledge, encoded by means of a set of first-order rules, and a set of positive and negative examples, our approach aims at singling out the examples showing abnormal behavior. The technique here proposed is unsupervised, since there are no examples of normal or abnormal behavior, even if it has connections with supervised learning, since it is based on induction from examples. We provide a notion of compliance of a set of facts with respect to a background knowledge and a set of examples, which is exploited to detect the examples that prevent to improve generalization of the induced hypothesis. By testing compliance with respect to both the direct and the dual concept, we are able to distinguish among three kinds of abnormalities, that are irregular, anomalous, and outlier observations. This allows us to provide a finer characterization of the anomaly at hand and to single out subtle forms of anomalies. Moreover, we are also able to provide explanations for the abnormality of an observation which make intelligible the motivation underlying its exceptionality. We present both exact and approximate algorithms for mining abnormalities. The approximate algorithms improve execution time while guaranteeing good accuracy. Moreover, we discuss peculiarities of the novel approach, present examples of knowledge mined, analyze the scalability of the algorithms, and provide comparison with noise handling mechanisms and some alternative approaches.