Automatic treebank-based acquisition of Arabic LFG dependency structures

  • Authors:
  • Lamia Tounsi;Mohammed Attia;Josef van Genabith

  • Affiliations:
  • Dublin City University, Ireland;Dublin City University, Ireland;Dublin City University, Ireland

  • Venue:
  • Semitic '09 Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

A number of papers have reported on methods for the automatic acquisition of large-scale, probabilistic LFG-based grammatical resources from treebanks for English (Cahill and al., 2002), (Cahill and al., 2004), German (Cahill and al., 2003), Chinese (Burke, 2004), (Guo and al., 2007), Spanish (O'Donovan, 2004), (Chrupala and van Genabith, 2006) and French (Schluter and van Genabith, 2008). Here, we extend the LFG grammar acquisition approach to Arabic and the Penn Arabic Treebank (ATB) (Maamouri and Bies, 2004), adapting and extending the methodology of (Cahill and al., 2004) originally developed for English. Arabic is challenging because of its morphological richness and syntactic complexity. Currently 98% of ATB trees (without FRAG and X) produce a covering and connected f-structure. We conduct a qualitative evaluation of our annotation against a gold standard and achieve an f-score of 95%.