Modeling syntactic context improves morphological segmentation

  • Authors:
  • Yoong Keok Lee;Aria Haghighi;Regina Barzilay

  • Affiliations:
  • Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology

  • Venue:
  • CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The connection between part-of-speech (POS) categories and morphological properties is well-documented in linguistics but underutilized in text processing systems. This paper proposes a novel model for morphological segmentation that is driven by this connection. Our model learns that words with common affixes are likely to be in the same syntactic category and uses learned syntactic categories to refine the segmentation boundaries of words. Our results demonstrate that incorporating POS categorization yields substantial performance gains on morphological segmentation of Arabic.