A minimum spanning tree-inspired clustering-based outlier detection technique

  • Authors:
  • Xiaochun Wang;Xia Li Wang;D. Mitch Wilkes

  • Affiliations:
  • School of Electronics and Information, Xi'an Jiaotong University, Xi'an, China;Department of Computer Science, Changan Univeristy, Xi'an, China;School of Engineering, Vanderbilt University, Nashville, TN

  • Venue:
  • ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Due to its important applications in data mining, many techniques have been developed for outlier detection. In this paper, an efficient three-phase outlier detection technique. First, we modify the famous k-means algorithm for an efficient construction of a spanning tree which is very close to a minimum spanning tree of the data set. Second, the longest edges in the obtained spanning tree are removed to form clusters. Based on the intuition that the data points in small clusters may be most likely all outliers, they are selected and regarded as outlier candidates. Finally, density-based outlying factors, LOF, are calculated for potential outlier candidates and accessed to pinpoint the local outliers. Extensive experiments on real and synthetic data sets show that the proposed approach can efficiently identify global as well as local outliers for large-scale datasets with respect to the state-of-the-art methods.