Language identification of names with SVMs

  • Authors:
  • Aditya Bhargava;Grzegorz Kondrak

  • Affiliations:
  • University of Alberta, Edmonton, Alberta, Canada;University of Alberta, Edmonton, Alberta, Canada

  • Venue:
  • HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

The task of identifying the language of text or utterances has a number of applications in natural language processing. Language identification has traditionally been approached with character-level language models. However, the language model approach crucially depends on the length of the text in question. In this paper, we consider the problem of language identification of names. We show that an approach based on SVMs with n-gram counts as features performs much better than language models. We also experiment with applying the method to pre-process transliteration data for the training of separate models.