Cybergenre: automatic identification of home pages on the web

  • Authors:
  • Michael Shepherd;Carolyn Watters;Alistair Kennedy

  • Affiliations:
  • Dalhousie University, Halifax, Canada;Dalhousie University, Halifax, Canada;Dalhousie University, Halifax, Canada

  • Venue:
  • Journal of Web Engineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The research reported in this paper is part of a larger project on the automatic classification of web pages by their genres. The long term goal is the incorporation of web page genre into the search process to improve the quality of the search results. In this phase, a neural net classifier was trained to distinguish home pages from non-home pages and to classify those home pages as personal home page, corporate home page or organization home page. In order to evaluate the importance of the functionality attribute of cybergenre in such classification, the web pages were characterized by the cybergenre attributes of 〈content, form, functionality〉 and the resulting classifications compared to classifications in which the web pages were characterized by the genre attributes of 〈content, form〉. Results indicate that the classifier is able to distinguish home pages from non-home pages and within the home page genre it is able to distinguish personal from corporate home pages. Organization home pages, however, were more difficult to distinguish from personal and corporate home pages. A significant improvement was found in identifying personal and corporate home pages when the functionality attribute was included.