Validation and regression testing for a cross-linguisic grammar resource

  • Authors:
  • Emily M. Bender;Laurie Poulson;Scott Drellishak;Chris Evans

  • Affiliations:
  • University of Washington, Seattle, WA;University of Washington, Seattle, WA;University of Washington, Seattle, WA;University of Washington, Seattle, WA

  • Venue:
  • DeepLP '07 Proceedings of the Workshop on Deep Linguistic Processing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a validation methodology for a cross-linguistic grammar resource which produces output in the form of small grammars based on elicited typological descriptions. Evaluating the resource entails sampling from a very large space of language types, the type and range of which preclude the use of standard test suites development techniques. We produce a database from which gold standard test suites for these grammars can be generated on demand, including well-formed strings paired with all of their valid semantic representations as well as a sample of ill-formed strings. These string-semantics pairs are selected from a set of candidates by a system of regular-expression based filters. The filters amount to an alternative grammar building system, whose generative capacity is limited compared to the actual grammars. We perform error analysis of the discrepancies between the test suites and grammars for a range of language types, and update both systems appropriately. The resulting resource serves as a point of comparison for regression testing in future development.