UNSUPERVISED ANOMALY DETECTION IN LARGE DATABASES USING BAYESIAN NETWORKS

Authors:
Antonio Cansado;Alvaro Soto
Affiliations:
Pontificia Universidad Católica de Chile, Santiago, Chile;Pontificia Universidad Católica de Chile, Santiago, Chile
Venue:
Applied Artificial Intelligence
Year:
2008

Citing 17
Cited 2

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Elements of information theory

Elements of information theory
A Bayesian Method for the Induction of Probabilistic Networks from Data

Machine Learning
Case-based reasoning: foundational issues, methodological variations, and system approaches

AI Communications
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Bayesian networks for knowledge discovery

Advances in knowledge discovery and data mining
Very fast EM-based mixture model clustering using multiresolution kd-trees

Proceedings of the 1998 conference on Advances in neural information processing systems II
Bayesian Networks and Decision Graphs

Bayesian Networks and Decision Graphs
Introduction to Expert Systems

Introduction to Expert Systems
Machine Learning

Machine Learning
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A Survey of Outlier Detection Methodologies

Artificial Intelligence Review
Learning Bayesian Networks

Learning Bayesian Networks
AN ACCELERATED ALGORITHM FOR DENSITY ESTIMATION IN LARGE DATABASES USING GAUSSIAN MIXTURES

Cybernetics and Systems
Learning bayesian network structure from massive datasets: the «sparse candidate« algorithm

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Learning equivalence classes of Bayesian network structures

UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence
Learning Bayesian network structures by searching for the best ordering with genetic algorithms

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Detection of Anomalies in Large Datasets Using an Active Learning Scheme Based on Dirichlet Distributions

IBERAMIA '08 Proceedings of the 11th Ibero-American conference on AI: Advances in Artificial Intelligence
Active learning and subspace clustering for anomaly detection

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today, there has been a massive proliferation of huge databases storing valuable information. The opportunities of an effective use of these new data sources are enormous; however, the huge size and dimensionality of current large databases calls for new ideas to scale up current statistical and computational approaches. This article presents an application of artificial intelligence technology to the problem of automatic detection of candidate anomalous records in a large database. We build our approach with three main goals in mind: 1) an effective detection of the records that are potentially anomalous; 2) a suitable selection of the subset of attributes that explains what makes a record anomalous; and 3) an efficient implementation that allows us to scale the approach to large databases. Our algorithm, called Bayesian network anomaly detector (BNAD), uses the joint probability density function (pdf) provided by a Bayesian network (BN) to achieve these goals. By using appropriate data structures, advanced caching techniques, the flexibility of Gaussian mixture models, and the efficiency of BNs to model joint pdfs, BNAD manages to efficiently learn a suitable BN from a large dataset. We test BNAD using synthetic and real databases, the latter from the fields of manufacturing and astronomy, obtaining encouraging results.