Web corpus mining by instance of Wikipedia

  • Authors:
  • Rüdiger Gleim;Alexander Mehler;Matthias Dehmer

  • Affiliations:
  • Bielefeld University, Bielefeld, Germany;Bielefeld University, Bielefeld, Germany;Technische Universität Darmstadt

  • Venue:
  • WAC '06 Proceedings of the 2nd International Workshop on Web as Corpus
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present an approach to structure learning in the area of web documents. This is done in order to approach the goal of webgenre tagging in the area of web corpus linguistics. A central outcome of the paper is that purely structure oriented approaches to web document classification provide an information gain which may be utilized in combined approaches of web content and structure analysis.