Towards language-independent web genre detection

  • Authors:
  • Philipp Scholl;Renato Domínguez García;Doreen Böhnstedt;Christoph Rensing;Ralf Steinmetz

  • Affiliations:
  • Technische Universität Darmstadt, Darmstadt, Germany;Technische Universität Darmstadt, Darmstadt, Germany;Technische Universität Darmstadt, Darmstadt, Germany;Technische Universität Darmstadt, Darmstadt, Germany;Technische Universität Darmstadt, Darmstadt, Germany

  • Venue:
  • Proceedings of the 18th international conference on World wide web
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The term web genre denotes the type of a given web resource, in contrast to the topic of its content. In this research, we focus on recognizing the web genres blog, wiki and forum. We present a set of features that exploit the hierarchical structure of the web page's HTML mark-up and thus, in contrast to related approaches, do not depend on a linguistic analysis of the page's content. Our results show that it is possible to achieve a very good accuracy for a fully language independent detection of structured web genres.