Efficient Clustering-Based Outlier Detection Algorithm for Dynamic Data Stream

  • Authors:
  • Manzoor Elahi;Kun Li;Wasif Nisar;Xinjie Lv;Hongan Wang

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • FSKD '08 Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 05
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Anomaly detection is currently an important and active research problem in many fields and involved in numerous applications. Most of the existing methods are based on distance measure. But in case of Data Stream these methods are not very efficient as computational point of view. Most of the exiting work on outlier detection in data stream declare a point as an outlier/inlier as soon as it arrive due to limited memory resources as compared to the huge data stream, to declare an outlier as it arrive often can lead us to a wrong decision, because of dynamic nature of the incoming data. In this paper we introduced a clustering based approach, which divide the stream in chunks and cluster each chunk using k-mean in fixed number of clusters. Instead of keeping only the summary information, which often used in case of clustering data stream, we keep the candidate outliers and mean value of every cluster for the next fixed number of steam chunks, to make sure that the detected candidate outliers are the real outliers. By employing the mean value of the clusters of previous chunk with mean values of the current chunk of stream, we decide better outlierness for data stream objects. Several experiments on different dataset confirm that our technique can find better outliers with low computational cost than the other exiting distance based approaches of outlier detection in data stream.