Coreference Resolution on Blogs and Commented News

  • Authors:
  • Iris Hendrickx;Veronique Hoste

  • Affiliations:
  • LT3 - Language and Translation Technology Team, University College Ghent, Ghent, Belgium;LT3 - Language and Translation Technology Team, University College Ghent, Ghent, Belgium and Department of Applied Mathematics and Computer Science, Ghent University, Ghent, Belgium

  • Venue:
  • DAARC '09 Proceedings of the 7th Discourse Anaphora and Anaphor Resolution Colloquium on Anaphora Processing and Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We focus on automatic coreference resolution for blogs and news articles with user comments as part of a project on opinion mining. We aim to study the effect of the genre shift from edited, structured newspaper text to unedited, unstructured blog data. We compare our coreference resolution system on three data sets: newspaper articles, mixed newspaper articles and reader comments, and blog data. As can be expected the performance of the automatic coreference resolution system drops drastically when tested on unedited text. We describe the characteristics of the different data sets and we examine the typical errors made by the resolution system.