Comparing the use of edited and unedited text in parser self-training

  • Authors:
  • Jennifer Foster;Özlem Çetinoğlu;Joachim Wagner;Josef van Genabith

  • Affiliations:
  • Dublin City University, Ireland;Dublin City University, Ireland;Dublin City University, Ireland;Dublin City University, Ireland

  • Venue:
  • IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We compare the use of edited text in the form of newswire and unedited text in the form of discussion forum posts as sources for training material in a self-training experiment involving the Brown reranking parser and a test set of sentences from an online sports discussion forum. We find that grammars induced from the two automatically parsed corpora achieve similar Parseval f-scores, with the grammars induced from the discussion forum material being slightly superior. An error analysis reveals that the two types of grammars do behave differently.