A scaleable automated quality assurance technique for semantic representations and proposition banks

  • Authors:
  • K. Bretonnel Cohen;Lawrence E. Hunter;Martha Palmer

  • Affiliations:
  • University of Colorado at Boulder;Computational Bioscience Program, U. of Colorado School of Medicine;University of Colorado at Boulder

  • Venue:
  • LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents an evaluation of an automated quality assurance technique for a type of semantic representation known as a predicate argument structure. These representations are crucial to the development of an important class of corpus known as a proposition bank. Previous work (Cohen and Hunter, 2006) proposed and tested an analytical technique based on a simple discovery procedure inspired by classic structural linguistic methodology. Cohen and Hunter applied the technique manually to a small set of representations. Here we test the feasibility of automating the technique, as well as the ability of the technique to scale to a set of semantic representations and to a corpus many times larger than that used by Cohen and Hunter. We conclude that the technique is completely automatable, uncovers missing sense distinctions and other bad semantic representations, and does scale well, performing at an accuracy of 69% for identifying bad representations. We also report on the implications of our findings for the correctness of the semantic representations in PropBank.