Using the web as an implicit training set: application to structural ambiguity resolution

  • Authors:
  • Preslav Nakov;Marti Hearst

  • Affiliations:
  • University of California at Berkeley, Berkeley, CA;University of California at Berkeley, Berkeley, CA

  • Venue:
  • HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent work has shown that very large corpora can act as training data for NLP algorithms even without explicit labels. In this paper we show how the use of surface features and paraphrases in queries against search engines can be used to infer labels for structural ambiguity resolution tasks. Using unsupervised algorithms, we achieve 84% precision on PP-attachment and 80% on noun compound coordination.