Random indexing distributional semantic models for Croatian language

Authors:
Vedrana Janković;Jan Šnajder;Bojana Dalbelo Bašić
Affiliations:
Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia;Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia;Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia
Venue:
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Year:
2011

Citing 8
Cited 0

Random projection in dimensionality reduction: applications to image and text data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Sparse Distributed Memory

Sparse Distributed Memory
Finding Semantically Related Words in Large Corpora

TSD '01 Proceedings of the 4th International Conference on Text, Speech and Dialogue
Dependency-Based Construction of Semantic Space Models

Computational Linguistics
Automatic acquisition of inflectional lexica for morphological normalisation

Information Processing and Management: an International Journal
One distributional memory, many semantic spaces

GEMS '09 Proceedings of the Workshop on Geometrical Models of Natural Language Semantics
Automatic word clustering in Russian texts

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
From frequency to meaning: vector space models of semantics

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributional semantic models (DSMs) model semantic relations between expressions by comparing the contexts in which these expressions occur. This paper presents an extensive evaluation of distributional semantic models for Croatian language. We focus on random indexing models, an efficient and scalable approach to building DSMs. We build a number of models with different parameters (dimension, context type, and similarity measure) and compare them against human semantic similarity judgments. Our results indicate that even low-dimensional random indexing models may outperform the raw frequency models, and that the choice of the similarity measure is most important.