Framework for mining web content outliers

  • Authors:
  • Malik Agyemang;Ken Barker;Reda Alhajj

  • Affiliations:
  • University of Calgary, Calgary, Alberta, Canada;University of Calgary, Calgary, Alberta, Canada;University of Calgary, Calgary, Alberta, Canada

  • Venue:
  • Proceedings of the 2004 ACM symposium on Applied computing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Outliers are data objects with different characteristics compared to other data objects. Exploring the diverse and dynamic web data for outliers is more interesting than finding outliers in numeric data sets. Interestingly, the existing web mining algorithms have concentrated on finding patterns that are frequent while discarding the less frequent ones that are likely to contain the outlying data. This paper refers to outliers present on the web as web outliers to distinguish them from traditional outliers. Web outliers are data objects that show significantly different characteristics than other web data. Although the presence of web outliers appears obvious, there is neither formal definition for web outliers nor algorithms for mining them. Secondly, traditional outlier mining algorithms designed solely for numeric data sets are inappropriate for mining web outliers. This paper establishes the presence of web outliers and discusses some practical applications of web outlier mining. Finally, we present taxonomy for web outliers and propose a general framework for mining web content out.