Exploiting domain knowledge to detect outliers

Authors:
Fabrizio Angiulli;Fabio Fassetti
Affiliations:
DIMES Department, University of Calabria, Rende, Italy;DIMES Department, University of Calabria, Rende, Italy
Venue:
Data Mining and Knowledge Discovery
Year:
2014

Citing 26
Cited 0

Foundations of logic programming; (2nd extended ed.)

Foundations of logic programming; (2nd extended ed.)
Experimental comparison of human and machine learning formalisms

Proceedings of the sixth international workshop on Machine learning
Approximate inference of functional dependencies from relations

ICDT '92 Selected papers of the fourth international conference on Database theory
Theories for mutagenicity: a study in first-order and feature-based induction

Artificial Intelligence - Special volume on empirical methods
Inductive logic programming with large-scale unstructured data

Machine intelligence 14
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Outlier detection for high dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Inductive Logic Programming: Techniques and Applications

Inductive Logic Programming: Techniques and Applications
Distance based approaches to relational learning and clustering

Relational Data Mining
Functional and embedded dependency inference: a data mining point of view

Information Systems - Special issue on Databases: creation, management and utilization
FOIL: A Midterm Report

ECML '93 Proceedings of the European Conference on Machine Learning
Fast Outlier Detection in High Dimensional Spaces

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Dependency Inference

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A Survey of Outlier Detection Methodologies

Artificial Intelligence Review
Outlier Mining in Large High-Dimensional Data Sets

IEEE Transactions on Knowledge and Data Engineering
Distance-Based Detection and Prediction of Outliers

IEEE Transactions on Knowledge and Data Engineering
Outlier detection by logic programming

ACM Transactions on Computational Logic (TOCL)
Angle-based outlier detection in high-dimensional data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier detection using default reasoning

Artificial Intelligence
DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets

ACM Transactions on Knowledge Discovery from Data (TKDD)
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
Outlier Detection Using Inductive Logic Programming

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Isolation-Based Anomaly Detection

ACM Transactions on Knowledge Discovery from Data (TKDD)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel definition of outlier whose aim is to embed an available domain knowledge in the process of discovering outliers. Specifically, given a background knowledge, encoded by means of a set of first-order rules, and a set of positive and negative examples, our approach aims at singling out the examples showing abnormal behavior. The technique here proposed is unsupervised, since there are no examples of normal or abnormal behavior, even if it has connections with supervised learning, since it is based on induction from examples. We provide a notion of compliance of a set of facts with respect to a background knowledge and a set of examples, which is exploited to detect the examples that prevent to improve generalization of the induced hypothesis. By testing compliance with respect to both the direct and the dual concept, we are able to distinguish among three kinds of abnormalities, that are irregular, anomalous, and outlier observations. This allows us to provide a finer characterization of the anomaly at hand and to single out subtle forms of anomalies. Moreover, we are also able to provide explanations for the abnormality of an observation which make intelligible the motivation underlying its exceptionality. We present both exact and approximate algorithms for mining abnormalities. The approximate algorithms improve execution time while guaranteeing good accuracy. Moreover, we discuss peculiarities of the novel approach, present examples of knowledge mined, analyze the scalability of the algorithms, and provide comparison with noise handling mechanisms and some alternative approaches.