FindWDO: a k-nearest neighbors approach for detecting Web document outliers

  • Authors:
  • Ayman Tanira;Ahmed Rafea;Hesham Hassan

  • Affiliations:
  • Palestine Technical College, Deir El-Ballah, Gaza Strip, Palestine;American University-Egypt, Cairo, Egypt;Cairo University, Giza, Egypt

  • Venue:
  • ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web content outliers are Web documents with varying contents compared to other Web documents taken from the same category. Mining Web content outliers can be utilized to the identification of competitors, emerging business patterns in e-commerce, and cleaning corpus used in Web documents classification. This paper proposes a k-nearest neighbors approach (FindWDO) for detecting Web document outliers. Experimental results showed that FindWDO outperforms a similar algorithm in the same domain.