Continuous kernel-based outlier detection over distributed data streams

Authors:
Liang Su;Weihong Han;Peng Zou;Yan Jia
Affiliations:
School of Computer Science National University of Defense Technology Changsha, China;School of Computer Science National University of Defense Technology Changsha, China;School of Computer Science National University of Defense Technology Changsha, China;School of Computer Science National University of Defense Technology Changsha, China
Venue:
ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
Year:
2007

Citing 7
Cited 0

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Anomaly Detection over Noisy Data using Learned Probability Distributions

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding (Recently) Frequent Items in Distributed Data Streams

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Mining distance-based outliers from large databases in any metric space

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Ranking outliers using symmetric neighborhood relationship

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Stream data are often transmitted over a distributed network, but in many cases, are too voluminous to be collected in a central location. Instead, we must perform distributed computations, guaranteeing high quality results in real-time even as new data arrive. In this paper, firstly, we formalize the problem of continuous outlier detection over distributed evolving data streams. Then, two novel outlier measures and algorithms are proposed which can identify outliers in a single pass. Furthermore, our experiments with synthetic and real data show that the proposed methods are both efficient and effective compared with existing outlier detection algorithms.