Software testing and the naturally occurring data assumption in natural language processing

  • Authors:
  • K. Bretonnel Cohen;William A. Baumgartner, Jr.;Lawrence Hunter

  • Affiliations:
  • The MITRE Corporation;University of Colorado School of Medicine;University of Colorado School of Medicine

  • Venue:
  • SETQA-NLP '08 Software Engineering, Testing, and Quality Assurance for Natural Language Processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is a widely accepted belief in natural language processing research that naturally occurring data is the best (and perhaps the only appropriate) data for testing text mining systems. This paper compares code coverage using a suite of functional tests and using a large corpus and finds that higher class, line, and branch coverage is achieved with structured tests than with even a very large corpus.