Supervised web document classification using discrete transforms, active hypercontours and expert knowledge

  • Authors:
  • P. S. Szczepaniak;A. Tomczyk;M. Pryczek

  • Affiliations:
  • Institute of Computer Science, Technical University of Lodz, Lodz, Poland and Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland;Institute of Computer Science, Technical University of Lodz, Lodz, Poland;Institute of Computer Science, Technical University of Lodz, Lodz, Poland

  • Venue:
  • WImBI'06 Proceedings of the 1st WICI international conference on Web intelligence meets brain informatics
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, a new method of supervised classification of documents is proposed. It utilizes discrete trasforms to extract features from classified objects and adopts adaptive potential active hypercontours (APAH) for document classification. The idea of APAH generalizes classic contour methods of image segmentation. It has two main advantages: it can use almost any knowledge during the search for an optimal classification function and it can operate in a feature space where only metric is defined. Here, both of them are utilized - the first one by using expert knowledge about significance of documents from training set and the second one by inducing new metrics in feature spaces. The method has been evaluated on the subset of open directory project (ODP) database and compared with k-NN, the well known classification technique.