Detecting outlying properties of exceptional objects

  • Authors:
  • Fabrizio Angiulli;Fabio Fassetti;Luigi Palopoli

  • Affiliations:
  • DEIS, Università della Calabria, Rende (CS), Italy;ICAR-CNR, Rende (CS), Italy;DEIS, Università della Calabria, Rende (CS), Italy

  • Venue:
  • ACM Transactions on Database Systems (TODS)
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Assume you are given a data population characterized by a certain number of attributes. Assume, moreover, you are provided with the information that one of the individuals in this data population is abnormal, but no reason whatsoever is given to you as to why this particular individual is to be considered abnormal. In several cases, you will be indeed interested in discovering such reasons. This article is precisely concerned with this problem of discovering sets of attributes that account for the (a priori stated) abnormality of an individual within a given dataset. A criterion is presented to measure the abnormality of combinations of attribute values featured by the given abnormal individual with respect to the reference population. In this respect, each subset of attributes is intended to somehow represent a “property” of individuals. We distinguish between global and local properties. Global properties are subsets of attributes explaining the given abnormality with respect to the entire data population. With local ones, instead, two subsets of attributes are singled out, where the former one justifies the abnormality within the data subpopulation selected using the values taken by the exceptional individual on those attributes included in the latter one. The problem of individuating abnormal properties with associated explanations is formally stated and analyzed. Such a formal characterization is then exploited in order to devise efficient algorithms for detecting both global and local forms of most abnormal properties. The experimental evidence, which is accounted for in the article, shows that the algorithms are both able to mine meaningful information and to accomplish the computational task by examining a negligible fraction of the search space.