Efficient algorithms for mining outliers from large data sets
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Fast Outlier Detection in High Dimensional Spaces
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Distance-based outliers: algorithms and applications
The VLDB Journal — The International Journal on Very Large Data Bases
Detecting distance-based outliers in streams of data
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Detecting Current Outliers: Continuous Outlier Detection over Time-Series Data Streams
DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Distance-Based Outlier Detection on Uncertain Data
CIT '09 Proceedings of the 2009 Ninth IEEE International Conference on Computer and Information Technology - Volume 02
Outlier detection over sliding windows for probabilistic data streams
Journal of Computer Science and Technology
An unbiased distance-based outlier detection approach for high-dimensional data
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Continuous monitoring of distance-based outliers over data streams
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Distance-Based outlier detection on uncertain data of gaussian distribution
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Generalized Box–MÜller Method for Generating -Gaussian Random Deviates
IEEE Transactions on Information Theory
Detecting Outliers in Sensor Networks Using the Geometric Approach
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Hi-index | 0.00 |
This paper studies the problem of top-k distance-based outlier detection on uncertain data. In this work, an uncertain object is modelled by a probability density function of a Gaussian distribution. We start with the Naive approach. We then introduce a populated-cell list (PC-list), a sorted list of non-empty cells of a grid (grid is used to index our data). Using PC-list, our top-k outlier detection algorithm needs to consider only a fraction of dataset objects and hence quickly identifies candidate objects for top-k outliers. An approximate top-k outlier detection algorithm is also presented to further increase the efficiency of our outlier detection algorithm. An extensive empirical study on synthetic and real datasets shows that our proposed approaches are efficient and scalable.