Fast outlier detection for very large log data

Authors:
Seung Kim;Nam Wook Cho;Bokyoung Kang;Suk-Ho Kang
Affiliations:
Dept. of Industrial Engineering, Seoul National University, 599 Gwanak-ro, Gwanak-gu, Seoul 151-744, Republic of Korea;Dept. of Industrial and Information Systems Engineering, Seoul National University of Technology, 172 Gongreung 2-dong, Nowon-gu, Seoul 139-743, Republic of Korea;Dept. of Industrial Engineering, Seoul National University, 599 Gwanak-ro, Gwanak-gu, Seoul 151-744, Republic of Korea;Dept. of Industrial Engineering, Seoul National University, 599 Gwanak-ro, Gwanak-gu, Seoul 151-744, Republic of Korea
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 12
Cited 0

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
K-d trees for semidynamic point sets

SCG '90 Proceedings of the sixth annual symposium on Computational geometry
FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Outlier detection for high dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Event Detection and Analysis from Video Streams

IEEE Transactions on Pattern Analysis and Machine Intelligence
Constructing Bayesian Networks to Predict Uncollectible Telecommunications Accounts

IEEE Expert: Intelligent Systems and Their Applications

Quantified Score

Hi-index	12.05

Visualization

Abstract

Density-based outlier detection identifies an outlying observation with reference to the density of the surrounding space. In spite of the several advantages of density-based outlier detections, its computational complexity remains one of the major barriers to its application. The purpose of the present study is to reduce the computation time of LOF (Local Outlier Factor), a density-based outlier detection algorithm. The proposed method incorporates kd-tree indexing and an approximated k-nearest neighbors search algorithm (ANN). Theoretical analysis on the approximation of nearest neighbor search was conducted. A set of experiments was conducted to examine the performance of the proposed algorithm. The results show that the method can effectively detect local outliers in a reduced computation time.