Semantic similarity measure of polish nouns based on linguistic features

  • Authors:
  • Maciej Piasecki;Bartosz Broda

  • Affiliations:
  • Institute of Applied Informatics, Wrocław University of Technology, Wrocław, Poland;Institute of Applied Informatics, Wrocław University of Technology, Wrocław, Poland

  • Venue:
  • BIS'07 Proceedings of the 10th international conference on Business information systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

A word-to-word similarity function automatically extracted from a corpus of texts can be a very helpful tool in automatic extraction of lexical semantic relations. There are many approaches for English, but only a few for inflective languages with almost free word order. In the paper a method for the construction of a similarity function for Polish nouns is proposed. The method uses only simple tools for language processing (e.g. it does need the application of a parser). The core is the construction of a matrix of co-occurrences of nouns and adjectives on the basis of application of morpho-syntactic constraints testing agreement between an adjective and a noun. Several methods of transformation of the matrix and calculation of the similarity function are presented. The achieved accuracy of 81.15% in WordNet-based Synonymy Test (for 4 611 Polish nouns, using the current version of PolishWordNet) seems to be comparable with the best results reported for English (e.g. 75.8% [5]).