Parallel Algorithms for Distance-Based and Density-Based Outliers

  • Authors:
  • Elio Lozano;Edgar Acuna

  • Affiliations:
  • University of Puerto Rico;University of Puerto Rico

  • Venue:
  • ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

An outlier is an observation that deviates so much from other observations as to arouse suspicion that it was generated by a different mechanism. Outlier detection has many applications, such as data cleaning, fraud detection and network intrusion. The existence of outliers can indicate individuals or groups that exhibit a behavior that is very different from most of the individuals of the dataset. In this paper we design two parallel algorithms, the first one is for finding out distance-based outliers based on nested loops along with randomization and the use of a pruning rule. The second parallel algorithm is for detecting density-based local outliers. In both cases data parallelism is used. We show that both algorithms reach near linear speedup. Our algorithms are tested on four real-world datasets coming from the Machine Learning Database Repository at the UCI.