Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Discourse and Information Structure
Journal of Logic, Language and Information
Tagging of very large corpora: topic-focus articulation
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Information structure and pauses in a corpus of spoken Danish
EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Hi-index | 0.00 |
This paper investigates the automatic identification of aspects of Information Structure (IS) in texts. The experiments use the Prague Dependency Treebank which is annotated with IS following the Praguian approach of Topic Focus Articulation. We automatically detect t(opic) and f(ocus), using node attributes from the treebank as basic features and derived features inspired by the annotation guidelines. We show the performance of C4.5, Bagging, and Ripper classifiers on several classes of instances such as nouns and pronouns, only nouns, only pronouns. A baseline system assigning always f(ocus) has an F-score of 42.5%. Our best system obtains 82.04%.