Continuous outlier detection in data streams: an extensible framework and state-of-the-art algorithms

Authors:
Dimitrios Georgiadis;Maria Kontaki;Anastasios Gounaris;Apostolos N. Papadopoulos;Kostas Tsichlas;Yannis Manolopoulos
Affiliations:
Aristotle University, Thessaloniki, Greece;Aristotle University, Thessaloniki, Greece;Aristotle University, Thessaloniki, Greece;Aristotle University, Thessaloniki, Greece;Aristotle University, Thessaloniki, Greece;Aristotle University, Thessaloniki, Greece
Venue:
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Year:
2013

Citing 7
Cited 0

M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Data Streams: Models and Algorithms (Advances in Database Systems)

Data Streams: Models and Algorithms (Advances in Database Systems)
Neighbor-based pattern detection for windows over streaming data

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Distance-based outlier queries in data streams: the novel task and algorithms

Data Mining and Knowledge Discovery
MOA: Massive Online Analysis

The Journal of Machine Learning Research
Continuous monitoring of distance-based outliers over data streams

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Anomaly detection is an important data mining task, aiming at the discovery of elements that show significant diversion from the expected behavior; such elements are termed as outliers. One of the most widely employed criteria for determining whether an element is an outlier is based on the number of neighboring elements within a fixed distance (R), against a fixed threshold (k). Such outliers are referred to as distance-based outliers and are the focus of this work. In this demo, we show both an extendible framework for outlier detection algorithms and specific outlier detection algorithms for the demanding case where outlier detection is continuously performed over a data stream. More specifically: i) first we demonstrate a novel flavor of an open-source publicly available tool for Massive Online Analysis (MOA) that is endowed with capabilities to encapsulate algorithms that continuously detect outliers and ii) second, we present four online outlier detection algorithms. Two of these algorithms have been designed by the authors of this demo, with a view to improving on key aspects related to outlier mining, such as running time, flexibility and space requirements.