Classification of Web Documents Using a Graph Model

  • Authors:
  • Adam Schenker;Mark Last;Horst Bunke;Abraham Kandel

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we describe work relating toclassification of web documents using a graph-basedmodel instead of the traditional vector-based model fordocument representation. We compare the classificationaccuracy of the vector model approach using the k-Nearest Neighbor (k-NN) algorithm to a novel approachwhich allows the use of graphs for documentrepresentation in the k-NN algorithm. The proposedmethod is evaluated on three different web documentcollections using the leave-one-out approach formeasuring classification accuracy. The results show thatthe graph-based k-NN approach can outperformtraditional vector-based k-NN methods in terms of bothaccuracy and execution time.