Continuous monitoring of distance-based outliers over data streams

Authors:
Maria Kontaki;Anastasios Gounaris;Apostolos N. Papadopoulos;Kostas Tsichlas;Yannis Manolopoulos
Affiliations:
Department of Informatics, Aristotle University, 54124 Thessaloniki, Greece;Department of Informatics, Aristotle University, 54124 Thessaloniki, Greece;Department of Informatics, Aristotle University, 54124 Thessaloniki, Greece;Department of Informatics, Aristotle University, 54124 Thessaloniki, Greece;Department of Informatics, Aristotle University, 54124 Thessaloniki, Greece
Venue:
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Year:
2011

Citing 0
Cited 2

Continuous outlier detection in data streams: an extensible framework and state-of-the-art algorithms

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Fast top-k distance-based outlier detection on uncertain data

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Anomaly detection is considered an important data mining task, aiming at the discovery of elements (also known as outliers) that show significant diversion from the expected case. More specifically, given a set of objects the problem is to return the suspicious objects that deviate significantly from the typical behavior. As in the case of clustering, the application of different criteria lead to different definitions for an outlier. In this work, we focus on distance-based outliers: an object x is an outlier if there are less than k objects lying at distance at most R from x. The problem offers significant challenges when a stream-based environment is considered, where data arrive continuously and outliers must be detected on-the-fly. There are a few research works studying the problem of continuous outlier detection. However, none of these proposals meets the requirements of modern stream-based applications for the following reasons: (i) they demand a significant storage overhead, (ii) their efficiency is limited and (iii) they lack flexibility. In this work, we propose new algorithms for continuous outlier monitoring in data streams, based on sliding windows. Our techniques are able to reduce the required storage overhead, run faster than previously proposed techniques and offer significant flexibility. Experiments performed on real-life as well as synthetic data sets verify our theoretical study.