Efficient outlier detection algorithm for heterogeneous data streams

Authors:
Jiadong Ren;Qunhui Wu;Jia Zhang;Jiadong Ren;Changzhen Hu
Affiliations:
College of Information Science and Engineering, Yanshan University, Qinhuangdao City, P.R.China;College of Information Science and Engineering, Yanshan University, Qinhuangdao City, P.R.China;College of Information Science and Engineering, Yanshan University, Qinhuangdao City, P.R.China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing City, P.R.China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing City, P.R.China
Venue:
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 5
Year:
2009

Citing 6
Cited 0

LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
An effective and efficient algorithm for high-dimensional outlier detection

The VLDB Journal — The International Journal on Very Large Data Bases
Detecting distance-based outliers in streams of data

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Efficient Clustering-Based Outlier Detection Algorithm for Dynamic Data Stream

FSKD '08 Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 05

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data streams outlier mining is an important and active research issue in anomaly detection. Most of the existing outlier detection algorithms can only manipulate numeric attributes or categorical attributes. In this paper, we propose an efficient outlier detection algorithm based on heterogeneous data streams, which partitions the stream in chunks. Then each chunk is clustered and the corresponding clustering results are stored in cluster references. The representation degree and the number of adjacent cluster references of each cluster reference are computed to generate the final outlier references, which include potential outliers. Experimental results show that our approach has higher detection precision and better scalability.