Annotated Gigaword

  • Authors:
  • Courtney Napoles;Matthew Gormley;Benjamin Van Durme

  • Affiliations:
  • Johns Hopkins University;Johns Hopkins University;Johns Hopkins University

  • Venue:
  • AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We have created layers of annotation on the English Gigaword v.5 corpus to render it useful as a standardized corpus for knowledge extraction and distributional semantics. Most existing large-scale work is based on inconsistent corpora which often have needed to be re-annotated by research teams independently, each time introducing biases that manifest as results that are only comparable at a high level. We provide to the community a public reference set based on current state-of-the-art syntactic analysis and coreference resolution, along with an interface for programmatic access. Our goal is to enable broader involvement in large-scale knowledge-acquisition efforts by researchers that otherwise may not have had the ability to produce such a resource on their own.