Corpora and data preparation

  • Authors:
  • Lynn Carlson;Boyan Onyshkevych;Mary Ellen Okurowski

  • Affiliations:
  • Ft. Meade, MD;Ft. Meade, MD;Ft. Meade, MD

  • Venue:
  • MUC5 '93 Proceedings of the 5th conference on Message understanding
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

The data selection and data preparation efforts which led to the TIPSTER and Fifth Message Understanding Conference (MUC-5) evaluation corpora involved substantial effort, time and resources. The Government commitment to these selection and preparation efforts stems from four TIPSTER Program objectives: (1) to provide training data that would promote the development of information extraction technology, (2) to provide accurate test data to evaluate and baseline system performance in an objective manner, (3) to provide a baseline for human performance to understand and interpret machine performance, and (4) to support the larger Natural Language Processing community by making available a unique set of texts and templates in multiple domains and languages under ARPA support. This commitment was demonstrated through the managerial, technical, and administrative support to these efforts from various Government agencies, as well as through the contractual efforts with the Institute for Defense Analyses for data preparation and New Mexico State University for software tool development.