Aligned-Parallel-Corpora Based Semi-Supervised Learning for Arabic Mention Detection

  • Authors:
  • Imed Zitouni;Yassine Benajiba

  • Affiliations:
  • Microsoft, Redmond, WA, USA;Thomson Reuters, New York, NY, USA

  • Venue:
  • IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the last two decades, significant effort has been put into annotating linguistic resources in several languages. Despite this valiant effort, there are still many languages left that have only small amounts of such resources. The goal of this article is to present and investigate a method of propagating information (specifically mentions) from a resource-rich language such as English into a relatively less-resource language such as Arabic. We compare also this approach to its equivalent counterpart using monolingual resources. Part of the investigation is to quantify the contribution of propagating information in different conditions - based on the availability of resources in the target language. Experiments on the language pair Arabic-English show that one can achieve relatively decent performance by propagating information from a language with richer resources such as English into Arabic alone (no resources or models in the source language Arabic). Furthermore, results show that propagated features from English do help improve the Arabic system performance even when used in conjunction with all feature types built from the source language. Experiments also show that using propagated features in conjunction with lexically-derived features only (as can be obtained directly from a mention annotated corpus) brings the system performance at the one obtained in the target language by using feature derived from many linguistic resources, therefore improving the system when such resources are not available.