Outlier detection over sliding windows for probabilistic data streams

Authors:
Bin Wang;Xiao-Chun Yang;Guo-Ren Wang;Ge Yu
Affiliations:
School of Information Science and Engineering, Northeastern University, Shenyang, China and Key Laboratory of Medical Image Computing, Northeastern University, Ministry of Education, Shenyang, Chi ...;School of Information Science and Engineering, Northeastern University, Shenyang, China and Key Laboratory of Medical Image Computing, Northeastern University, Ministry of Education, Shenyang, Chi ...;School of Information Science and Engineering, Northeastern University, Shenyang, China and Key Laboratory of Medical Image Computing, Northeastern University, Ministry of Education, Shenyang, Chi ...;School of Information Science and Engineering, Northeastern University, Shenyang, China and Key Laboratory of Medical Image Computing, Northeastern University, Ministry of Education, Shenyang, Chi ...
Venue:
Journal of Computer Science and Technology
Year:
2010

Citing 12
Cited 1

LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Discovery-Driven Exploration of OLAP Data Cubes

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Density-based clustering of uncertain data

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Probabilistic skylines on uncertain data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Real-Time Monitoring of Uncertain Data Streams Using Probabilistic Similarity

RTSS '07 Proceedings of the 28th IEEE International Real-Time Systems Symposium
Ranking queries on uncertain data: a probabilistic threshold approach

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Sliding-window top-k queries on uncertain streams

Proceedings of the VLDB Endowment
A Framework for Clustering Uncertain Data Streams

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Improved approximate detection of duplicates for data streams over sliding windows

Journal of Computer Science and Technology

Fast top-k distance-based outlier detection on uncertain data

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, outlier detection is still novel in the emerging uncertain data field. In this paper, we study the semantic of outlier detection on probabilistic data stream and present a new definition of distance-based outlier over sliding window. We then show the problem of detecting an outlier over a set of possible world instances is equivalent to the problem of finding the k-th element in its neighborhood. Based on this observation, a dynamic programming algorithm (DPA) is proposed to reduce the detection cost from O(2|R(e,d)|) to O(|kċR(e,d)|), where R(e, d) is the d-neighborhood of e. Furthermore, we propose a pruning-based approach (PBA) to effectively and efficiently filter non-outliers on single window, and dynamically detect recent m elements incrementally. Finally, detailed analysis and thorough experimental results demonstrate the efficiency and scalability of our approach.