Part-of-speech induction from scratch

  • Authors:
  • Hinrich Schütze

  • Affiliations:
  • Center for the Study of Language and Information, Stanford, CA

  • Venue:
  • ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a method for inducing the parts of speech of a language and part-of-speech labels for individual words from a large text corpus. Vector representations for the part-of-speech of a word are formed from entries of its near lexical neighbors. A dimensionality reduction creates a space representing the syntactic categories of unambiguous words. A neural net trained on these spatial representations classifies individual contexts of occurrence of ambiguous words. The method classifies both ambiguous and unambiguous words correctly with high accuracy.