A comparison of language identification approaches on short, query-style texts

  • Authors:
  • Thomas Gottron;Nedim Lipka

  • Affiliations:
  • Institut für Informatik, Johannes Gutenberg-Universität Mainz, Mainz, Germany;Faculty of Media, Media Systems, Bauhaus University Weimar, Weimar, Germany

  • Venue:
  • ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In a multi-language Information Retrieval setting, the knowledge about the language of a user query is important for further processing. Hence, we compare the performance of some typical approaches for language detection on very short, query-style texts. The results show that already for single words an accuracy of more than 80% can be achieved, for slightly longer texts we even observed accuracy values close to 100%.