Mining Top-n Local Outliers in Constrained Spatial Networks

  • Authors:
  • Chongsheng Zhang;Zhongbo Wu;Bo Qu;Hong Chen

  • Affiliations:
  • School of Information, Renmin University of China, Beijing, China 100872 and MOE Key Lab of Data Engineering and Knowledge Engineering, Beijing, China 100872;School of Information, Renmin University of China, Beijing, China 100872 and MOE Key Lab of Data Engineering and Knowledge Engineering, Beijing, China 100872;School of Information, Renmin University of China, Beijing, China 100872 and MOE Key Lab of Data Engineering and Knowledge Engineering, Beijing, China 100872;School of Information, Renmin University of China, Beijing, China 100872 and MOE Key Lab of Data Engineering and Knowledge Engineering, Beijing, China 100872

  • Venue:
  • ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Outlier mining, also called outlier detection, is a challenging research issue in data mining with important applications as intrusion detection, fraud detection and medical analysis. From the perspective of data, previous work on outlier mining have involved in various types of data such as spatial data, time series data, trajectory data, and sensor data. However, few of them have considered a constrained spatial networks data in which each object must reside or move along a certain edge. In fact, in such special constrained spatial network data environments, previous outlier definitions and the according mining algorithms could work neither properly nor efficiently. In this paper we introduce a new definition of density-based local outlier in constrained spatial networks that considers for each object the outlier-ness with respect to its k nearest neighbors. Moreover , to detect outliers efficiently, we propose a fast cluster-and-bound algorithm that first cluster on each individual edge, then estimate the outlying degree of each cluster and prune those that could not contain top-n outliers, therefore constraining the computation of outliers to only very limited objects. Experiments on synthetic data sets demonstrate the scalability, effectiveness and efficiency of our methods.