Automatic annotation of bibliographical references with target language

  • Authors:
  • Harald Hammarström

  • Affiliations:
  • Chalmers University, Gothenburg, Sweden

  • Venue:
  • MMIES '08 Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In a large-scale project to list bibliographical references to all of the ca 7 000 languages of the world, the need arises to automatically annotated the bibliographical entries with ISO-639-3 language identifiers. The task can be seen as a special case of a more general Information Extraction problem: to classify short text snippets in various languages into a large number of classes. We will explore supervised and unsupervised approaches motivated by distributional characterists of the specific domain and availability of data sets. In all cases, we make use of a database with language names and identifiers. The suggested methods are rigorously evaluated on a fresh representative data set.