A corpus-based approach to topic in Danish dialog

  • Authors:
  • Philip Diderichsen;Jakob Elming

  • Affiliations:
  • Lund University Cognitive Science, Lund University, Sweden;Copenhagen Business School, Denmark

  • Venue:
  • ACLstudent '05 Proceedings of the ACL Student Research Workshop
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We report on an investigation of the pragmatic category of topic in Danish dialog and its correlation to surface features of NPs. Using a corpus of 444 utterances, we trained a decision tree system on 16 features. The system achieved near-human performance with success rates of 84--89% and F1-scores of 0.63--0.72 in 10-fold cross validation tests (human performance: 89% and 0.78). The most important features turned out to be preverbal position, definiteness, pronominalisation, and non-subordination. We discovered that NPs in epistemic matrix clauses (e.g. "I think ...") were seldom topics and we suspect that this holds for other interpersonal matrix clauses as well.