Systematic construction of anomaly detection benchmarks from real data

Authors:
Andrew F. Emmott;Shubhomoy Das;Thomas Dietterich;Alan Fern;Weng-Keen Wong
Affiliations:
Oregon State University, Corvallis, Oregon;Oregon State University, Corvallis, Oregon;Oregon State University, Corvallis, Oregon;Oregon State University, Corvallis, Oregon;Oregon State University, Corvallis, Oregon
Venue:
Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description
Year:
2013

Citing 17
Cited 1

LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Random Forests

Machine Learning
Novelty detection: a review—part 1: statistical approaches

Signal Processing
Support Vector Data Description

Machine Learning
A Survey of Outlier Detection Methodologies

Artificial Intelligence Review
A Fast Dual Algorithm for Kernel Logistic Regression

Machine Learning
Dendritic cells for SYN scan detection

Proceedings of the 9th annual conference on Genetic and evolutionary computation
Isolation Forest

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
On detecting clustered anomalies using SCiForest

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Increasing availability of industrial systems through data stream mining

Computers and Industrial Engineering
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Spatiotemporal Models for Data-Anomaly Detection in Dynamic Environmental Monitoring Campaigns

ACM Transactions on Sensor Networks (TOSN)
A new classification method for breast cancer diagnosis: feature selection artificial immune recognition system (FS-AIRS)

ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part II
Ensemble Methods: Foundations and Algorithms

Ensemble Methods: Foundations and Algorithms
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
Bridging the Gap: A Pragmatic Approach to Generating Insider Threat Data

SPW '13 Proceedings of the 2013 IEEE Security and Privacy Workshops

Ensembles for unsupervised outlier detection: challenges and research questions a position paper

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Research in anomaly detection suffers from a lack of realistic and publicly-available problem sets. This paper discusses what properties such problem sets should possess. It then introduces a methodology for transforming existing classification data sets into ground-truthed benchmark data sets for anomaly detection. The methodology produces data sets that vary along three important dimensions: (a) point difficulty, (b) relative frequency of anomalies, and (c) clusteredness. We apply our generated datasets to benchmark several popular anomaly detection algorithms under a range of different conditions.