Structure-sensitive learning of text types

  • Authors:
  • Peter Geibel;Ulf Krumnack;Olga Pustylnikov;Alexander Mehler;Helmar Gust;Kai-Uwe Kühnberger

  • Affiliations:
  • University of Osnabrück, Institute of Cognitive Science, AI Group, Germany;University of Osnabrück, Institute of Cognitive Science, AI Group, Germany;University of Bielefeld, Text Technology Group, Germany;University of Bielefeld, Text Technology Group, Germany;University of Osnabrück, Institute of Cognitive Science, AI Group, Germany;University of Osnabrück, Institute of Cognitive Science, AI Group, Germany

  • Venue:
  • AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we discuss the structure based classification of documents based on their logical document structure, i.e., their DOM trees.We describe a method using predefined structural features and also four tree kernels suitable for such structures. We evaluate the methods experimentally on a corpus containing the DOM trees of newspaper articles, and on the well-known SUSANNE corpus. We will demonstrate that, for the two corpora, many text types can be learned based on structural features only.