Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Activity monitoring: noticing interesting changes in behavior
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Mining top-n local outliers in large databases
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
Toward Optimal Active Learning through Sampling Estimation of Error Reduction
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Anomaly Detection over Noisy Data using Learned Probability Distributions
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Algorithms for Mining Distance-Based Outliers in Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Minority report in fraud detection: classification of skewed data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A Survey of Outlier Detection Methodologies
Artificial Intelligence Review
An Empirical Bayes Approach to Detect Anomalies in Dynamic Multidimensional Arrays
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Outlier detection by active learning
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning Bayesian Networks
A System for the Analysis of Jet Engine Vibration Data
Integrated Computer-Aided Engineering
AN ACCELERATED ALGORITHM FOR DENSITY ESTIMATION IN LARGE DATABASES USING GAUSSIAN MIXTURES
Cybernetics and Systems
UNSUPERVISED ANOMALY DETECTION IN LARGE DATABASES USING BAYESIAN NETWORKS
Applied Artificial Intelligence
Active learning for object classification: from exploration to exploitation
Data Mining and Knowledge Discovery
ACM Computing Surveys (CSUR)
Volume traffic anomaly detection using hierarchical clustering
APNOMS'09 Proceedings of the 12th Asia-Pacific network operations and management conference on Management enabling the future internet for changing business and new computing services
Learning bayesian network structure from massive datasets: the «sparse candidate« algorithm
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Clustering and classification based anomaly detection
FSKD'06 Proceedings of the Third international conference on Fuzzy Systems and Knowledge Discovery
An active learning framework for content-based information retrieval
IEEE Transactions on Multimedia
Robust classification of imbalanced data using one-class and two-class SVM-based multiclassifiers
Intelligent Data Analysis - Business Analytics and Intelligent Optimization
Hi-index | 0.00 |
Today, anomaly detection is a highly valuable application in the analysis of current huge datasets. Insurance companies, banks and many manufacturing industries need systems to help humans to detect anomalies in their daily information. In general, anomalies are a very small fraction of the data, therefore their detection is not an easy task. Usually real sources of an anomaly are given by specific values expressed on selective dimensions of datasets, furthermore, many anomalies are not really interesting for humans, due to the fact that interestingness of anomalies is categorized subjectively by the human user. In this paper we propose a new semi-supervised algorithm that actively learns to detect relevant anomalies by interacting with an expert user in order to obtain semantic information about user preferences. Our approach is based on 3 main steps. First, a Bayes network identifies an initial set of candidate anomalies. Afterwards, a subspace clustering technique identifies relevant subsets of dimensions. Finally, a probabilistic active learning scheme, based on properties of Dirichlet distribution, uses the feedback from an expert user to efficiently search for relevant anomalies. Our results, using synthetic and real datasets, indicate that, under noisy data and anomalies presenting regular patterns, our approach correctly identifies relevant anomalies.