Word sense acquisition from bilingual comparable corpora

Authors:
Hiroyuki Kaji
Affiliations:
Central Research Laboratory, Hitachi, Ltd., Kokubunji-shi, Tokyo, Japan
Venue:
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Year:
2003

Citing 10
Cited 5

Word sense disambiguation using a second language monolingual corpus

Computational Linguistics
Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery
Discovering word senses from text

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to the special issue on word sense disambiguation: the state of the art

Computational Linguistics - Special issue on word sense disambiguation
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Word-sense disambiguation using statistical methods

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Noun classification from predicate-argument structures

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Automatic recognition of verbal polysemy

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Unsupervised word sense disambiguation using bilingual comparable corpora

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1

Multi-level bootstrapping for extracting parallel sentences from a quasi-comparable corpus

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Exploiting comparable corpora with TER and TERp

BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Using comparable corpora to improve the effectiveness of cross-language information retrieval

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Parallel sentence generation from comparable corpora for improved SMT

Machine Translation
Unsupervised translation sense clustering

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Manually constructing an inventory of word senses has suffered from problems including high cost, arbitrary assignment of meaning to words, and mismatch to domains. To overcome these problems, we propose a method to assign word meaning from a bilingual comparable corpus and a bilingual dictionary. It clusters second-language translation equivalents of a first-language target word on the basis of their translingually aligned distribution patterns. Thus it produces a hierarchy of corpus-relevant meanings of the target word, each of which is defined with a set of translation equivalents. The effectiveness of the method has been demonstrated through an experiment using a comparable corpus consisting of Wall Street Journal and Nihon Keizai Shimbun corpora together with the EDR bilingual dictionary.