Is Web Genre Identification Feasible?

  • Authors:
  • Benno Stein;Sven Meyer zu Eissen

  • Affiliations:
  • Faculty of Media / Media Systems. Bauhaus University Weimar, Germany. benno.stein@medien.uni-weimar.de;Faculty of Media / Media Systems. Bauhaus University Weimar, Germany. benno.stein@medien.uni-weimar.de

  • Venue:
  • Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper contributes to a facet from the area of Web Information Retrieval that has recently received much attention: The satisfaction of a user's personal information need with respect to text type, presentation type, or information quality. We imply that such properties can be quantified for all kinds of Web documents, and we subsume them under the term “Web genre” or “genre”. Recent surveys show that there is---to a certain degree---a common understanding of Web genre. However, the strictness by which genre and non-genre aspects of a document are experienced is an individual matter. To get a better understanding of the challenges of Web genre identification and its possible limits we investigate in this paper a very interesting question, which has not been posed by now: Given a categorization C of documents (or bookmarks, links, document identifiers), can we provide a reliable assessment whether C is governed by topic or by genre considerations?