Automatic selection of heterogeneous syntactic features in semantic similarity of polish nouns

  • Authors:
  • Maciej Piasecki;Stanisław Szpakowicz;Bartosz Broda

  • Affiliations:
  • Institute of Applied Informatics, Wrocław University of Technology, Poland;School of Information Technology and Engineering, University of Ottawa and Institute of Computer Science, Polish Academy of Sciences;Institute of Applied Informatics, Wrocław University of Technology, Poland

  • Venue:
  • TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present experiments with a variety of corpus-based measures applied to the problem of constructing semantic similarity functions for Polish nouns. Rich inflection in Polish allows us to acquire useful syntactic features without parsing; morphosyntactic restrictions checked in a large enough window provide sufficiently useful data. A novel feature selection method gives the accuracy of 86% on the WordNet-based synonymy test, an improvement of 5% over the previous results.