Using the web to overcome data sparseness

  • Authors:
  • Frank Keller;Maria Lapata;Olga Ourioupina

  • Affiliations:
  • University of Edinburgh, Edinburgh, UK;University of Edinburgh, Edinburgh, UK;Saarland University, Saarbrücken, Germany

  • Venue:
  • EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper shows that the web can be employed to obtain frequencies for bigrams that are unseen in a given corpus. We describe a method for retrieving counts for adjective-noun, noun-noun, and verb-object bigrams from the web by querying a search engine. We evaluate this method by demonstrating that web frequencies and correlate with frequencies obtained from a carefully edited, balanced corpus. We also perform a task-based evaluation, showing that web frequencies can reliably predict human plausibility judgments.