Web outlier mining: Discovering outliers from web datasets

  • Authors:
  • Malik Agyemang;Ken Barker;Reda Alhajj

  • Affiliations:
  • Department of Computer Science, University of Calgary, 2500 University Drive N.W., Calgary, AB, Canada T2N 1N4. E-mail: agyemang,barker,alhajj@cpsc.ucalgary.ca;Department of Computer Science, University of Calgary, 2500 University Drive N.W., Calgary, AB, Canada T2N 1N4. E-mail: agyemang,barker,alhajj@cpsc.ucalgary.ca;Department of Computer Science, University of Calgary, 2500 University Drive N.W., Calgary, AB, Canada T2N 1N4. E-mail: agyemang,barker,alhajj@cpsc.ucalgary.ca and Department of Computer Science, ...

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Exception mining in large datasets is an important task in traditional data mining with numerous applications in credit card fraud detection, weather prediction, intrusion detection, and cellular phone cloning fraud detection; among other applications. Sifting through the dynamic, unstructured, and ever-growing web data for outliers is more challenging than finding outliers in numeric datasets. Interestingly, existing outlier mining algorithms are restricted to finding outliers in numeric datasets leaving web outlier mining as an open research issue. Web outliers are web data that show significantly different characteristics than other web data taken from the same category. Although the presence of web outliers appears obvious, algorithms for mining them are currently unavailable. Secondly, traditional outlier mining algorithms designed solely for numeric datasets cannot be used on web datasets because they typically contain multimedia. This paper establishes the presence of outliers on the web called web outliers and proposes a general framework for mining them. A web outlier taxonomy is reported that supports the development of content-specific algorithms for mining web outliers. Finally, we propose the WCO-Mine algorithm for mining web content outliers. Experimental results demonstrate that WCO-Mine is capable of finding web outliers from web datasets.