Web page classification based on k-nearest neighbor approach

  • Authors:
  • Oh-Woog Kwon;Jong-Hyeok Lee

  • Affiliations:
  • Dept. of Computer Science and Engineering, Pohang University of Science and Technology, San 31 Hyoja Dong, Pohang, 790-784, Korea;Dept. of Computer Science and Engineering, Pohang University of Science and Technology, San 31 Hyoja Dong, Pohang, 790-784, Korea

  • Venue:
  • IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic categorization is the only viable method to deal with the scaling problem of the World Wide Web. In this paper, we propose a Web page classifier based on an adaptation of k-Nearest Neighbor (k-NN) approach. To improve the performance of k-NN approach, we supplement k-NN approach with a feature selection method and a term-weighting scheme using markup tags, and reform document-document similarity measure used in vector space model. In our experiments on a Korean commercial Web directory, our proposed methods in k-NN approach for Web page classification improved the performance of classification.