Human judgment as a parameter in evaluation campaigns

  • Authors:
  • Jean-Baptiste Berthelin;Cyril Grouin;Martine Hurault-Plantet;Patrick Paroubek

  • Affiliations:
  • LIMSI-CNRS, Orsay Cedex;LIMSI-CNRS, Orsay Cedex;LIMSI-CNRS, Orsay Cedex;LIMSI-CNRS, Orsay Cedex

  • Venue:
  • HumanJudge '08 Proceedings of the Workshop on Human Judgements in Computational Linguistics
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The relevance of human judgment in an evaluation campaign is illustrated here through the DEFT text mining campaigns. In a first step, testing a topic for a campaign among a limited number of human evaluators informs us about the feasibility of a task. This information comes from the results obtained by the judges, as well as from their personal impressions after passing the test. In a second step, results from individual judges, as well as their pairwise matching, are used in order to adjust the task (choice of a marking scale for DEFT'07 and selection of topical categories for DEFT'08). Finally, the mutual comparison of competitors' results, at the end of the evaluation campaign, confirms the choices we made at its starting point, and provides means to redefine the task when we shall launch a future campaign based on the same topic.