Deducing linguistic structure from the statistics of large corpora

  • Authors:
  • Eric Brill;David Magerman;Mitchell Marcus;Beatrice Santorini

  • Affiliations:
  • -;-;-;-

  • Venue:
  • HLT '90 Proceedings of the workshop on Speech and Natural Language
  • Year:
  • 1990

Quantified Score

Hi-index 0.00

Visualization

Abstract

Within the last two years, approaches using both stochastic and symbolic techniques have proved adequate to deduce lexical ambiguity resolution rules with less than 3-4% error rate, when trained on moderate sized (500K word) corpora of English text (e.g. Church, 1988; Hindle, 1989). The success of these techniques suggests that much of the grammatical structure of language may be derived automatically through distributional analysis, an approach attempted and abandoned in the 1950s.