Learning information structure in the Prague treebank

  • Authors:
  • Oana Postolache

  • Affiliations:
  • University of Saarland, Saarbrücken, Germany

  • Venue:
  • ACLstudent '05 Proceedings of the ACL Student Research Workshop
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper investigates the automatic identification of aspects of Information Structure (IS) in texts. The experiments use the Prague Dependency Treebank which is annotated with IS following the Praguian approach of Topic Focus Articulation. We automatically detect t(opic) and f(ocus), using node attributes from the treebank as basic features and derived features inspired by the annotation guidelines. We show the performance of C4.5, Bagging, and Ripper classifiers on several classes of instances such as nouns and pronouns, only nouns, only pronouns. A baseline system assigning always f(ocus) has an F-score of 42.5%. Our best system obtains 82.04%.