Time for More Languages: Temporal Tagging of Arabic, Italian, Spanish, and Vietnamese

  • Authors:
  • Jannik Strötgen;Ayser Armiti;Tran Van Canh;Julian Zell;Michael Gertz

  • Affiliations:
  • Heidelberg University;Heidelberg University;Heidelberg University;Heidelberg University;Heidelberg University

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most of the research on temporal tagging so far is done for processing English text documents. There are hardly any multilingual temporal taggers supporting more than two languages. Recently, the temporal tagger HeidelTime has been made publicly available, supporting the integration of new languages by developing language-dependent resources without modifying the source code. In this article, we describe our work on developing such resources for two Asian and two Romance languages: Arabic, Vietnamese, Spanish, and Italian. While temporal tagging of the two Romance languages has been addressed before, there has been almost no research on Arabic and Vietnamese temporal tagging so far. Furthermore, we analyze language-dependent challenges for temporal tagging and explain the strategies we followed to address them. Our evaluation results on publicly available and newly annotated corpora demonstrate the high quality of our new resources for the four languages, which we make publicly available to the research community.