Automatic recognition of German news focusing on future-directed beliefs and intentions

  • Authors:
  • Judith Eckle-Kohler;Michael Kohler;Jens Mehnert

  • Affiliations:
  • Darmstadt University of Technology, Department of Mathematics, Schloígartenstrasse 7, 64289 Darmstadt, Germany;Darmstadt University of Technology, Department of Mathematics, Schloígartenstrasse 7, 64289 Darmstadt, Germany;Darmstadt University of Technology, Department of Mathematics, Schloígartenstrasse 7, 64289 Darmstadt, Germany

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the classification of German news stories as either focusing on future-directed beliefs and intentions or lacking these. The method proposed in this article requires only a small set of labeled training data. Rather, we introduce German clues for the automatic identification of future-orientation which are used for automatic labeling of Reuters news stories. We describe the development of a high-precision procedure for automatic labeling in a bootstrapping fashion: A first version of the labeling procedure uses the absence of clues for future-directedness as indicator for non-future-directedness and is able to automatically label about one-third of the Reuters news stories with high precision. Then a perceptron is applied to the automatically labeled news stories in order to semi-automatically acquire an additional set of clues for non-future-directedness. The second version of the labeling procedure additionally uses these clues and achieves remarkably improved results in terms of recall; it can even be extended by a guessing step to perform classification with an error of 22.5%. We also investigate another way to increase the recall by using the automatically labeled news stories as training data for statistical classifiers. Three different types of statistical classifiers are applied in order to address the question, which classifier is most suited for the text classification task considered. The best statistical classifier combined with the results of improved automatic labeling is able to recognize the two classes of news stories with an error of 19%.