Handling unknown words in statistical latent-variable parsing models for Arabic, English and French

  • Authors:
  • Mohammed Attia;Jennifer Foster;Deirdre Hogan;Joseph Le Roux;Lamia Tounsi;Josef van Genabith

  • Affiliations:
  • Dublin City University;Dublin City University;Dublin City University;Dublin City University;Dublin City University;Dublin City University

  • Venue:
  • SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a study of the impact of using simple and complex morphological clues to improve the classification of rare and unknown words for parsing. We compare this approach to a language-independent technique often used in parsers which is based solely on word frequencies. This study is applied to three languages that exhibit different levels of morphological expressiveness: Arabic, French and English. We integrate information about Arabic affixes and morphotactics into a PCFG-LA parser and obtain state-of-the-art accuracy. We also show that these morphological clues can be learnt automatically from an annotated corpus.