Text categorization of commercial Web pages

  • Authors:
  • E. Binaghi;M. Carullo;I. Gallo;M. Madaio

  • Affiliations:
  • Universitá degli Studi dell'Insubria, Varese, Italy;Universitá degli Studi dell'Insubria, Varese, Italy;Universitá degli Studi dell'Insubria, Varese, Italy;Universitá degli Studi dell'Insubria, Varese, Italy

  • Venue:
  • AIA '08 Proceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we describe a new on-line document categorization strategy that can be integrated within Web applications. A salient aspect is the use of neural learning in both representation and classification tasks. Within text documents conceived as images, the regions of interest (RoI) containing information meaningful for categorization are identified with the support of a supervised neural network. Text within RoI is represented according to a simple solution that consider the first K words in the text and code them properly. A Kohonen Self-Organizing Map (SOM) is applied to cluster documents that are subsequently labelled by applying a simple majority voting mechanism. Solutions adopted were evaluated by conducting experiments within the context of on-line price comparison services. Results obtained demontrate that the overall classification strategy is able to categorize documents satisfectorily taking into account the high variability of Web pages.