Contextual correlates of synonymy
Communications of the ACM
Semantic similarity for detecting recognition errors in automatic speech transcripts
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
How well do semantic relatedness measures perform?: a meta-study
STEP '08 Proceedings of the 2008 Conference on Semantics in Text Processing
Cross-lingual semantic relatedness using encyclopedic knowledge
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Hi-index | 0.00 |
Despite the growth in digitization of data, there are still many languages without sufficient corpora to achieve valid measures of semantic similarity. If it could be shown that manually-assigned similarity scores from one language can be transferred to another language, then semantic similarity values could be used for languages with fewer resources. We test an automatic word similarity measure based on second-order co-occurrences in the Google ngram corpus, for English, German, and French. We show that the scores manually-assigned in the experiments of Rubenstein and Goodenough's for 65 English word pairs can be transferred directly into German and French. We do this by conducting human evaluation experiments for French word pairs (and by using similarly produced scores for German). We show that the correlation between the automatically-assigned semantic similarity scores and the scores assigned by human evaluators is not very different when using the Rubenstein and Goodenough's scores across language, compared to the language-specific scores.