Morphology induction from term clusters

  • Authors:
  • Dayne Freitag

  • Affiliations:
  • HNC Software, San Diego, CA

  • Venue:
  • CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the problem of learning a morphological automaton directly from a monolingual text corpus without recourse to additional resources. Like previous work in this area, our approach exploits orthographic regularities in a search for possible morphological segmentation points. Instead of affixes, however, we search for affix transformation rules that express correspondences between term clusters induced from the data. This focuses the system on substrings having syntactic function, and yields cluster-to-cluster transformation rules which enable the system to process unknown morphological forms of known words accurately. A stem-weighting algorithm based on Hubs and Authorities is used to clarify ambiguous segmentation points. We evaluate our approach using the CELEX database.