Time for More Languages: Temporal Tagging of Arabic, Italian, Spanish, and Vietnamese

Authors:
Jannik Strötgen;Ayser Armiti;Tran Van Canh;Julian Zell;Michael Gertz
Affiliations:
Heidelberg University;Heidelberg University;Heidelberg University;Heidelberg University;Heidelberg University
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2014

Citing 18
Cited 0

UIMA: an architectural approach to unstructured information processing in the corporate research environment

Natural Language Engineering
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Robust temporal processing of news

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Event ordering using TERSEO system

Data & Knowledge Engineering - Special issue: Application of natural language to information systems (NLDB04)
Multilingual Extension of Temporal Expression Recognition Using Parallel Corpora

TIME '07 Proceedings of the 14th International Symposium on Temporal Representation and Reasoning
Temporal processing with the TARSQI toolkit

COLING '08 22nd International Conference on on Computational Linguistics: Demonstration Papers
Evaluating knowledge-based approaches to the multilingual extension of a temporal expression normalizer

ARTE '06 Proceedings of the Workshop on Annotating and Reasoning about Time and Events
Arabic Natural Language Processing: Challenges and Solutions

ACM Transactions on Asian Language Information Processing (TALIP)
SemEval-2010 task 13: TempEval-2

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
TIPSem (English and Spanish): Evaluating CRFs and semantic roles in TempEval-2

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
HeidelTime: High quality rule-based extraction and normalization of temporal expressions

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
WikiWars: a new corpus for research on temporal expressions

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Construction of Vietnamese corpora for named entity recognition

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
French TimeBank: an ISO-TimeML annotated reference corpus

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Annotating events, temporal expressions and relations in Italian: the It-TimeML experience for the Ita-TimeBank

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Automatic transformation from TIDES to TimeML annotation

Language Resources and Evaluation
ZamAn and raqm: extracting temporal and numerical expressions in arabic

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
A New Model of Time Expressions Detection and Annotation in Vietnamese: The hôm Case

IALP '12 Proceedings of the 2012 International Conference on Asian Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most of the research on temporal tagging so far is done for processing English text documents. There are hardly any multilingual temporal taggers supporting more than two languages. Recently, the temporal tagger HeidelTime has been made publicly available, supporting the integration of new languages by developing language-dependent resources without modifying the source code. In this article, we describe our work on developing such resources for two Asian and two Romance languages: Arabic, Vietnamese, Spanish, and Italian. While temporal tagging of the two Romance languages has been addressed before, there has been almost no research on Arabic and Vietnamese temporal tagging so far. Furthermore, we analyze language-dependent challenges for temporal tagging and explain the strategies we followed to address them. Our evaluation results on publicly available and newly annotated corpora demonstrate the high quality of our new resources for the four languages, which we make publicly available to the research community.