Extracting sense-disambiguated example sentences from parallel corpora

Authors:
Gerard de Melo;Gerhard Weikum
Affiliations:
Max Planck Institute for Informatics, Saarbrücken, Germany;Max Planck Institute for Informatics, Saarbrücken, Germany
Venue:
WDE '09 Proceedings of the 1st Workshop on Definition Extraction
Year:
2009

Citing 9
Cited 2

A systematic comparison of various statistical alignment models

Computational Linguistics
The Web as a parallel corpus

Computational Linguistics - Special issue on web as corpus
Selective sampling for example-based word sense disambiguation

Computational Linguistics
Two languages are more informative than one

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Combining clues for word alignment

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Building a sense tagged corpus with open mind word expert

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
SemEval-2007 task 17: English lexical sample, SRL and all words

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
An unsupervised method for multilingual word sense tagging using parallel corpora: a preliminary investigation

WorkSense '00 Proceedings of the ACL-2000 Workshop on Word Senses and Multi-Linguality

Towards automatic acquisition of a fully sense tagged corpus for persian

ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Search result diversification methods to assist lexicographers

LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Example sentences provide an intuitive means of grasping the meaning of a word, and are frequently used to complement conventional word definitions. When a word has multiple meanings, it is useful to have example sentences for specific senses (and hence definitions) of that word rather than indiscriminately lumping all of them together. In this paper, we investigate to what extent such sense-specific example sentences can be extracted from parallel corpora using lexical knowledge bases for multiple languages as a sense index. We use word sense disambiguation heuristics and a cross-lingual measure of semantic similarity to link example sentences to specific word senses. From the sentences found for a given sense, an algorithm then selects a smaller subset that can be presented to end users, taking into account both representativeness and diversity. Preliminary results show that a precision of around 80% can be obtained for a reasonable number of word senses, and that the subset selection yields convincing results.