Informativeness of inflective noun bigrams in croatian

Authors:
Damir Jurić;Marko Banek;Šandor Dembitz
Affiliations:
Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia;Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia;Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia
Venue:
KES-AMSTA'12 Proceedings of the 6th KES international conference on Agent and Multi-Agent Systems: technologies and applications
Year:
2012

Citing 6
Cited 0

Word association norms, mutual information, and lexicography

Computational Linguistics
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Multiword Expressions: A Pain in the Neck for NLP

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Advantages of online spellchecking: a Croatian example

Software—Practice & Experience
Named-entity recognition for polish with SProUT

IMTCI'04 Proceedings of the Second international conference on Intelligent Media Technology for Communicative Intelligence
Identification of multi-word expressions by combining multiple linguistic information sources

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A feature of Croatian and other Slavic languages is a rich inflection system, which does not exist in English and other languages that traditionally dominate the scientific focus of computational linguistics. In this paper we present the results of the experiments conducted on the corpus of the Croatian online spellchecker Hascheck, which point to using non-nominative cases for discovering collocations between two nouns, specifically the first name and the family name of a person. We analyzed the frequencies and conditional probabilities of the morphemes corresponding to Croatian cases and quantified the level of attraction between two words using the normalized pointwise mutual information measure. Two components of a personal name are more likely to co-occur in any of the non-nominative cases than in nominative. Furthermore, given a component of a personal name, the conditional probability that it is accompanied with the other component of the name are higher for the genitive/accusative and instrumental case than for nominative.