A statistical approach to machine translation
Computational Linguistics
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Fast and Accurate Sentence Alignment of Bilingual Corpora
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
A systematic comparison of various statistical alignment models
Computational Linguistics
Adaptive Parallel Sentences Mining from Web Bilingual News Collection
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Computational Linguistics - Special issue on web as corpus
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Bitext maps and alignment via pattern recognition
Computational Linguistics
A portable algorithm for mapping bitext correspondence
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
An IR approach for translating new words from nonparallel, comparable texts
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A program for aligning sentences in bilingual corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Aligning a parallel English-Chinese corpus statistically with lexical criteria
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
K-vec: a new approach for aligning parallel texts
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Automatic identification of word translations from unrelated English and German corpora
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Using noisy bilingual data for statistical machine translation
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Inducing multilingual text analysis tools via robust projection across aligned corpora
HLT '01 Proceedings of the first international conference on Human language technology research
An unsupervised method for word sense tagging using parallel corpora
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Discriminative training and maximum entropy models for statistical machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
A noisy-channel approach to question answering
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Reliable measures for aligning Japanese-English news articles and sentences
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A comparison of algorithms for maximum entropy parameter estimation
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Sentence alignment for monolingual comparable corpora
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Improving IBM word-alignment model 1
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A geometric view on bilingual lexicon extraction from comparable corpora
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Extracting parallel sub-sentential fragments from non-parallel corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
On the use of comparable corpora to improve SMT performance
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Evaluation of the bible as a resource for cross-language information retrieval
MLRI '06 Proceedings of the Workshop on Multilingual Language Resources and Interoperability
Language and translation model adaptation using comparable corpora
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A simple sentence-level extraction algorithm for comparable data
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Domain adaptation for statistical classifiers
Journal of Artificial Intelligence Research
An Intelligent Agent That Autonomously Learns How to Translate
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Frontiers in linguistic annotation for lower-density languages
LAC '06 Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006
A beam-search extraction algorithm for comparable data
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
WikiBABEL: a wiki-style platform for creation of parallel data
ACLDemos '09 Proceedings of the ACL-IJCNLP 2009 Software Demonstrations
Mining bilingual data from the web with adaptively learnt patterns
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Train the machine with what it can learn: corpus selection for SMT
BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Exploiting comparable corpora with TER and TERp
BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Improved statistical machine translation using monolingually-derived paraphrases
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
wikiBABEL: community creation of multilingual data
WikiSym '08 Proceedings of the 4th International Symposium on Wikis
Extracting parallel sentences from comparable corpora using document level alignment
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A Collection of Comparable Corpora for Under-resourced Languages
Proceedings of the 2010 conference on Human Language Technologies -- The Baltic Perspective: Proceedings of the Fourth International Conference Baltic HLT 2010
Translingual document representations from discriminative projections
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Example-based paraphrasing for improved phrase-based statistical machine translation
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Extracting parallel fragments from comparable corpora for data-to-text generation
INLG '10 Proceedings of the 6th International Natural Language Generation Conference
An empirical study on web mining of parallel data
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Large scale parallel document mining for machine translation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Contextual modeling for meeting translation using unsupervised word sense disambiguation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Creating a Persian-English comparable corpus
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
A kernel regression framework for SMT
Machine Translation
A survey of paraphrasing and textual entailment methods
Journal of Artificial Intelligence Research
Enhancing multi-lingual information extraction via cross-media inference and fusion
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
EM-based hybrid model for bilingual terminology extraction from comparable corpora
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Joint bilingual sentiment classification with unlabeled parallel corpora
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Crowdsourcing translation: professional quality from non-professionals
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Rare word translation extraction from aligned comparable documents
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
From bilingual dictionaries to interlingual document representations
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Two easy improvements to lexical weighting
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Using Sublexical Translations to Handle the OOV Problem in Machine Translation
ACM Transactions on Asian Language Information Processing (TALIP)
No free lunch: brute force vs. locality-sensitive hashing for cross-lingual pairwise similarity
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Learning discriminative projections for text similarity measures
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Bilingual lexicon extraction from comparable corpora enhanced with parallel corpora
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Two ways to use a noisy parallel news corpus for improving statistical machine translation
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Paraphrase fragment extraction from monolingual comparable corpora
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Extracting parallel phrases from comparable data
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Active learning with multiple annotations for comparable data classification task
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Unsupervised alignment of comparable data and text resources
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
An Expectation Maximization algorithm for textual unit alignment
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Graph-based bilingual sentence alignment from large scale web pages
NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Cross-lingual text fragment alignment using divergence from randomness
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Parallel sentence generation from comparable corpora for improved SMT
Machine Translation
Translation selection through machine learning with language resources
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
A minimally supervised approach for detecting and ranking document translation pairs
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
The Karlsruhe Institute of Technology translation systems for the WMT 2011
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
CMU Haitian Creole-English translation system for WMT 2011
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Improving bilingual projections via sparse covariance matrices
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Machine translation between Hebrew and Arabic
Machine Translation
Enabling users to create their own web-based machine translation engine
Proceedings of the 21st international conference companion on World Wide Web
Topic based creation of a persian-english comparable corpus
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Detecting highly confident word translations from comparable corpora without any prior knowledge
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
HDU: cross-lingual textual entailment with SMT features
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Design of a hybrid high quality machine translation system
EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Translation model adaptation for statistical machine translation with monolingual topic information
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Cross-lingual mixture model for sentiment classification
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Twitter translation using translation-based cross-lingual retrieval
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
A language modeling approach for extracting translation knowledge from comparable corpora
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Distributional phrasal paraphrase generation for statistical machine translation
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Sections on Paraphrasing; Intelligent Systems for Socially Aware Computing; Social Computing, Behavioral-Cultural Modeling, and Prediction
Mining a Persian-English comparable corpus for cross-language information retrieval
Information Processing and Management: an International Journal
An intelligent Web agent that autonomously learns how to translate
Web Intelligence and Agent Systems
Hi-index | 0.00 |
We present a novel method for discovering parallel sentences in comparable, non-parallel corpora. We train a maximum entropy classifier that, given a pair of sentences, can reliably determine whether or not they are translations of each other. Using this approach, we extract parallel data from large Chinese, Arabic, and English non-parallel newspaper corpora. We evaluate the quality of the extracted data by showing that it improves the performance of a state-of-the-art statistical machine translation system. We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus. Thus, our method can be applied with great benefit to language pairs for which only scarce resources are available.