Data-driven approaches for information structure identification

Authors:
Oana Postolache;Ivana Kruijff-Korbayová;Geert-Jan M. Kruijff
Affiliations:
University of Saarland, Saarbrücken, Germany;University of Saarland, Saarbrücken, Germany;German Research Center for Artificial Intelligence (DFKI GmbH), Saarbrücken, Germany
Venue:
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Year:
2005

Citing 6
Cited 2

Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Discourse and Information Structure

Journal of Logic, Language and Information
Tagging of very large corpora: topic-focus articulation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Producing contextually appropriate intonation in an information-state based dialogue system

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Topic-focus and salience

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Information based intonation synthesis

HLT '94 Proceedings of the workshop on Human Language Technology

Learning information status of discourse entities

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Information status distinctions and referring expressions: An empirical study of references to people in news summaries

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates automatic identification of Information Structure (IS) in texts. The experiments use the Prague Dependency Treebank which is annotated with IS following the Praguian approach of Topic Focus Articulation. We automatically detect t(opic) and f(ocus), using node attributes from the treebank as basic features and derived features inspired by the annotation guidelines. We present the performance of decision trees (C4.5), maximum entropy, and rule induction (RIPPER) classifiers on all tectogrammatical nodes. We compare the results against a baseline system that always assigns f(ocus) and against a rule-based system. The best system achieves an accuracy of 90.69%, which is a 44.73% improvement over the baseline (62.66%).