Comparison of String Distance Metrics for Lemmatisation of Named Entities in Polish

  • Authors:
  • Jakub Piskorski;Marcin Sydow;Karol Wieloch

  • Affiliations:
  • Joint Research Centre of the European Commission, Web Mining and Intelligence of IPSC,T.P. 267, Ispra, Italy 21027;Web Mining Lab, Intelligent Systems Dept., Polish-Japanese Institute of Information Technology, Warsaw, Poland 02-008;Department of Information Systems, Poznań Univeristy of Economics, Poznań, Poland 60-967

  • Venue:
  • Human Language Technology. Challenges of the Information Society
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the results of recent experiments on application of string distance metrics to the problem of named entity lemmatisation in Polish. It extends of our work in [1] by introducing new results for organisation names. Furthermore, the results presented here and in [2,3] centering around the same topic were used to make a comparative study of the average usefulness of the numerous examined string distance metrics to lemmatisation of Polish named-entities of various types. In particular, we focus on lemmatisation of country names, organisation names and person names.