Fast anomaly detection despite the duplicates

  • Authors:
  • Jay Yoon Lee;U. Kang;Danai Koutra;Christos Faloutsos

  • Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA, USA;Korea Advanced Institute of Technology, Daejeon, South Korea;Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA

  • Venue:
  • Proceedings of the 22nd international conference on World Wide Web companion
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given a large cloud of multi-dimensional points, and an off-the shelf outlier detection method, why does it take a week to finish? After careful analysis, we discovered that duplicate points create subtle issues, that the literature has ignored: if dmax is the multiplicity of the most over-plotted point, typical algorithms are quadratic on dmax. We propose several ways to eliminate the problem; we report wall-clock times and our time savings; and we show that our methods give either exact results, or highly accurate approximate ones.