Active learning for the identification of nonliteral language

  • Authors:
  • Julia Birke;Anoop Sarkar

  • Affiliations:
  • Simon Fraser University, Burnaby, BC, Canada;Simon Fraser University, Burnaby, BC, Canada

  • Venue:
  • FigLanguages '07 Proceedings of the Workshop on Computational Approaches to Figurative Language
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present an active learning approach used to create an annotated corpus of literal and nonliteral usages of verbs. The model uses nearly unsupervised word-sense disambiguation and clustering techniques. We report on experiments in which a human expert is asked to correct system predictions in different stages of learning: (i) after the last iteration when the clustering step has converged, or (ii) during each iteration of the clustering algorithm. The model obtains an f-score of 53.8% on a dataset in which literal/nonliteral usages of 25 verbs were annotated by human experts. In comparison, the same model augmented with active learning obtains 64.91%. We also measure the number of examples required when model confidence is used to select examples for human correction as compared to random selection. The results of this active learning system have been compiled into a freely available annotated corpus of literal/nonliteral usage of verbs in context.