Fast top-k distance-based outlier detection on uncertain data

Authors:
Salman Ahmed Shaikh;Hiroyuki Kitagawa
Affiliations:
Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Ibaraki, Japan;Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Ibaraki, Japan
Venue:
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Year:
2013

Citing 12
Cited 0

Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Fast Outlier Detection in High Dimensional Spaces

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Distance-based outliers: algorithms and applications

The VLDB Journal — The International Journal on Very Large Data Bases
Detecting distance-based outliers in streams of data

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Detecting Current Outliers: Continuous Outlier Detection over Time-Series Data Streams

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Distance-Based Outlier Detection on Uncertain Data

CIT '09 Proceedings of the 2009 Ninth IEEE International Conference on Computer and Information Technology - Volume 02
Outlier detection over sliding windows for probabilistic data streams

Journal of Computer Science and Technology
An unbiased distance-based outlier detection approach for high-dimensional data

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Continuous monitoring of distance-based outliers over data streams

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Distance-Based outlier detection on uncertain data of gaussian distribution

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Generalized Box–MÜller Method for Generating -Gaussian Random Deviates

IEEE Transactions on Information Theory
Detecting Outliers in Sensor Networks Using the Geometric Approach

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper studies the problem of top-k distance-based outlier detection on uncertain data. In this work, an uncertain object is modelled by a probability density function of a Gaussian distribution. We start with the Naive approach. We then introduce a populated-cell list (PC-list), a sorted list of non-empty cells of a grid (grid is used to index our data). Using PC-list, our top-k outlier detection algorithm needs to consider only a fraction of dataset objects and hence quickly identifies candidate objects for top-k outliers. An approximate top-k outlier detection algorithm is also presented to further increase the efficiency of our outlier detection algorithm. An extensive empirical study on synthetic and real datasets shows that our proposed approaches are efficient and scalable.