Informativeness of inflective noun bigrams in croatian

  • Authors:
  • Damir Jurić;Marko Banek;Šandor Dembitz

  • Affiliations:
  • Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia;Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia;Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia

  • Venue:
  • KES-AMSTA'12 Proceedings of the 6th KES international conference on Agent and Multi-Agent Systems: technologies and applications
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

A feature of Croatian and other Slavic languages is a rich inflection system, which does not exist in English and other languages that traditionally dominate the scientific focus of computational linguistics. In this paper we present the results of the experiments conducted on the corpus of the Croatian online spellchecker Hascheck, which point to using non-nominative cases for discovering collocations between two nouns, specifically the first name and the family name of a person. We analyzed the frequencies and conditional probabilities of the morphemes corresponding to Croatian cases and quantified the level of attraction between two words using the normalized pointwise mutual information measure. Two components of a personal name are more likely to co-occur in any of the non-nominative cases than in nominative. Furthermore, given a component of a personal name, the conditional probability that it is accompanied with the other component of the name are higher for the genitive/accusative and instrumental case than for nominative.