New methods for deviation-based outlier detection in large database

Authors:
Zhiyuan Zhang;Xia Feng
Affiliations:
School of Computer Science & Technology, Civil Aviation University of China, Tianjin;School of Computer Science & Technology, Civil Aviation University of China, Tianjin
Venue:
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Year:
2009

Citing 3
Cited 0

Balancing histogram optimality and practicality for query result size estimation

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Mining Deviants in a Time Series Database

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Outlier (also called deviation or exception) detection is an important function in data mining. In identifying outliers, the deviation-based approach has many advantages and draws much attention. Although a linear algorithm for sequential deviation detection is proposed, it is not stable and always loses many deviation points. In this paper, we present three algorithms on detecting deviations. The first algorithm is time proportional to the square of the dataset length, and the second is time proportional to the square of the number of distinct data values. These two algorithms lead to same result, while the latter is much more efficient than the former. In the third algorithm, a deviation factor is defined to help finding deviation points. Although leading to approximation results, it is the most efficient of the three, especially to large datasets with lots of distinct values.