A statistical approach to machine translation
Computational Linguistics
Identifying word correspondence in parallel texts
HLT '91 Proceedings of the workshop on Speech and Natural Language
A comparison of text retrieval models
The Computer Journal - Special issue on information retrieval
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Word sense disambiguation using a second language monolingual corpus
Computational Linguistics
Some thoughts on similarity measures
ACM SIGIR Forum
Translating collocations for bilingual lexicons: a statistical approach
Computational Linguistics
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Explanation and generalization of vector models in information retrieval
SIGIR '82 Proceedings of the 5th annual ACM conference on Research and development in information retrieval
Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Termight: identifying and translating technical terminology
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Aligning sentences in bilingual corpora using lexical information
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
An algorithm for finding noun phrase correspondences in bilingual corpora
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Unsupervised word sense disambiguation rivaling supervised methods
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
A pattern matching method for finding noun and proper noun translations from noisy parallel corpora
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Estimating upper and lower bounds on the performance of word-sense disambiguation programs
ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
K-vec: a new approach for aligning parallel texts
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Domain word translation by space-frequency analysis of context length histograms
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Knowledge Extraction from Bilingual Corpora
Information Extraction: Towards Scalable, Adaptable Systems
Research to Improve Cross-Language Retrieval - Position Paper for CLEF
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
An approach based on multilingual thesauri and model combination for bilingual lexicon extraction
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Enriching Multilingual Language Resources by Discovering Missing Cross-Language Links in Wikipedia
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Brains, not brawn: The use of “smart” comparable corpora in bilingual terminology mining
ACM Transactions on Speech and Language Processing (TSLP)
Bilingual lexicon generation using non-aligned signatures
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Bilingual sense similarity for statistical machine translation
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Robust measurement and comparison of context similarity for finding translation pairs
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Revisiting context-based projection methods for term-translation spotting in comparable corpora
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Identifying idiomatic expressions using phrase alignments in bilingual parallel corpus
PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Using comparable corpora to improve the effectiveness of cross-language information retrieval
IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
EM-based hybrid model for bilingual terminology extraction from comparable corpora
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Effective use of dependency structure for bilingual lexicon creation
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Building and using comparable corpora for domain-specific bilingual lexicon extraction
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Bilingual lexicon extraction from comparable corpora enhanced with parallel corpora
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Bilingual lexicon extraction from comparable corpora as metasearch
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Bootstrapping bilingual lexicons from comparable corpora for closely related languages
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
French-english terminology extraction from comparable corpora
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Statistical Extraction and Comparison of Pivot Words for Bilingual Lexicon Extension
ACM Transactions on Asian Language Information Processing (TALIP)
QAlign: a new method for bilingual lexicon extraction from comparable corpora
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Effective and efficient?: bilingual sentiment lexicon extraction using collocation alignment
Proceedings of the 21st ACM international conference on Information and knowledge management
A Fast and Accurate Method for Bilingual Opinion Lexicon Extraction
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Hi-index | 0.00 |
We present two problems for statistically extracting bilingual lexicon: (1) How can noisy parallel corpora be used? (2) How can non-parallel yet comparable corpora be used? We describe our own work and contribution in relaxing the constraint of using only clean parallel corpora. DKvec is a method for extracting bilingual lexicons, from noisy parallel corpora based on arrival distances of words in noisy parallel corpora. Using DKvec on noisy parallel corpora in English/Japanese and English/Chinese, our evaluations show a 55.35% precision from a small corpus and 89.93% precision from a larger corpus. Our major contribution is in the extraction of bilingual lexicon from non-parallel corpora. We present a first such result in this area, from a new method-Convec. Convec is based on context information of a word to be translated. We show a 30% to 76% precision when top-one to top-20 translation candidates are considered. Most of the top-20 candidates are either collocations or words related to the correct translation. Since nonparallel corpora contain a lot more polysemous words, many-to-many translations, and different lexical items in the two languages, we conclude that the output from Convec is reasonable and useful.