Detecting outlying properties of exceptional objects

Authors:
Fabrizio Angiulli;Fabio Fassetti;Luigi Palopoli
Affiliations:
DEIS, Università della Calabria, Rende (CS), Italy;ICAR-CNR, Rende (CS), Italy;DEIS, Università della Calabria, Rende (CS), Italy
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2009

Citing 25
Cited 5

The complexity of optimization problems

Journal of Computer and System Sciences - Structure in Complexity Theory Conference, June 2-5, 1986
A taxonomy of complexity classes of functions

Journal of Computer and System Sciences
The complexity of selecting maximal solutions

Information and Computation
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
Explora: a multipattern and multistrategy discovery assistant

Advances in knowledge discovery and data mining
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Outlier detection for high dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Discovery-Driven Exploration of OLAP Data Cubes

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Fast Outlier Detection in High Dimensional Spaces

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Distance-based outliers: algorithms and applications

The VLDB Journal — The International Journal on Very Large Data Bases
A Survey of Outlier Detection Methodologies

Artificial Intelligence Review
Outlier Mining in Large High-Dimensional Data Sets

IEEE Transactions on Knowledge and Data Engineering
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Example-Based Robust Outlier Detection in High Dimensional Datasets

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Detecting outlying subspaces for high-dimensional data: the new task, algorithms, and performance

Knowledge and Information Systems
A Novel Method for Detecting Outlying Subspaces in High-dimensional Databases Using Genetic Algorithm

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
HOT: hypergraph-based outlier test for categorical data

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining

Finding key attribute subset in dataset for outlier detection

Knowledge-Based Systems
Mining special features to improve the performance of e-commerce product selection and resume processing

International Journal of Computational Science and Engineering
OutRules: a framework for outlier descriptions in multiple context spaces

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Mining multidimensional contextual outliers from categorical relational data

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Flexible and adaptive subspace search for outlier analysis

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Assume you are given a data population characterized by a certain number of attributes. Assume, moreover, you are provided with the information that one of the individuals in this data population is abnormal, but no reason whatsoever is given to you as to why this particular individual is to be considered abnormal. In several cases, you will be indeed interested in discovering such reasons. This article is precisely concerned with this problem of discovering sets of attributes that account for the (a priori stated) abnormality of an individual within a given dataset. A criterion is presented to measure the abnormality of combinations of attribute values featured by the given abnormal individual with respect to the reference population. In this respect, each subset of attributes is intended to somehow represent a “property” of individuals. We distinguish between global and local properties. Global properties are subsets of attributes explaining the given abnormality with respect to the entire data population. With local ones, instead, two subsets of attributes are singled out, where the former one justifies the abnormality within the data subpopulation selected using the values taken by the exceptional individual on those attributes included in the latter one. The problem of individuating abnormal properties with associated explanations is formally stated and analyzed. Such a formal characterization is then exploited in order to devise efficient algorithms for detecting both global and local forms of most abnormal properties. The experimental evidence, which is accounted for in the article, shows that the algorithms are both able to mine meaningful information and to accomplish the computational task by examining a negligible fraction of the search space.