Learning information structure in the Prague treebank

Authors:
Oana Postolache
Affiliations:
University of Saarland, Saarbrücken, Germany
Venue:
ACLstudent '05 Proceedings of the ACL Student Research Workshop
Year:
2005

Citing 4
Cited 1

Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Discourse and Information Structure

Journal of Logic, Language and Information
Tagging of very large corpora: topic-focus articulation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Topic-focus and salience

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics

Information structure and pauses in a corpus of spoken Danish

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the automatic identification of aspects of Information Structure (IS) in texts. The experiments use the Prague Dependency Treebank which is annotated with IS following the Praguian approach of Topic Focus Articulation. We automatically detect t(opic) and f(ocus), using node attributes from the treebank as basic features and derived features inspired by the annotation guidelines. We show the performance of C4.5, Bagging, and Ripper classifiers on several classes of instances such as nouns and pronouns, only nouns, only pronouns. A baseline system assigning always f(ocus) has an F-score of 42.5%. Our best system obtains 82.04%.