Fast outlier detection for very large log data

  • Authors:
  • Seung Kim;Nam Wook Cho;Bokyoung Kang;Suk-Ho Kang

  • Affiliations:
  • Dept. of Industrial Engineering, Seoul National University, 599 Gwanak-ro, Gwanak-gu, Seoul 151-744, Republic of Korea;Dept. of Industrial and Information Systems Engineering, Seoul National University of Technology, 172 Gongreung 2-dong, Nowon-gu, Seoul 139-743, Republic of Korea;Dept. of Industrial Engineering, Seoul National University, 599 Gwanak-ro, Gwanak-gu, Seoul 151-744, Republic of Korea;Dept. of Industrial Engineering, Seoul National University, 599 Gwanak-ro, Gwanak-gu, Seoul 151-744, Republic of Korea

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 12.05

Visualization

Abstract

Density-based outlier detection identifies an outlying observation with reference to the density of the surrounding space. In spite of the several advantages of density-based outlier detections, its computational complexity remains one of the major barriers to its application. The purpose of the present study is to reduce the computation time of LOF (Local Outlier Factor), a density-based outlier detection algorithm. The proposed method incorporates kd-tree indexing and an approximated k-nearest neighbors search algorithm (ANN). Theoretical analysis on the approximation of nearest neighbor search was conducted. A set of experiments was conducted to examine the performance of the proposed algorithm. The results show that the method can effectively detect local outliers in a reduced computation time.