Adaptive Parallel Sentences Mining from Web Bilingual News Collection
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
The Journal of Machine Learning Research
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
An efficient method for determining bilingual word classes
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
The KANT system: fast, accurate, high-quality translation in practical domains
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
HMM-based word alignment in statistical translation
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
A syntax-based statistical translation model
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Phrasal cohesion and statistical machine translation
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Bilingual word spectral clustering for statistical machine translation
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Bilingual LSA-based adaptation for statistical machine translation
Machine Translation
Domain adaptation in statistical machine translation with mixture modelling
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Cross-language linking of news stories on the web using interlingual topic modelling
Proceedings of the 2nd ACM workshop on Social web search and mining
Multilingual topic models for unaligned text
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Cross-lingual latent topic extraction
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Word alignment with synonym regularization
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Holistic sentiment analysis across languages: multilingual supervised latent Dirichlet allocation
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Bayesian word alignment for statistical machine translation
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Knowledge transfer across multilingual corpora via latent topics
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Cache-based document-level statistical machine translation
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Latent association analysis of document pairs
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Translation model adaptation for statistical machine translation with monolingual topic information
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A topic similarity model for hierarchical phrase-based translation
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Topic models for dynamic translation model adaptation
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Discovering a lexicon of parts and attributes
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Hi-index | 0.00 |
We propose a novel bilingual topical admixture (BiTAM) formalism for word alignment in statistical machine translation. Under this formalism, the parallel sentence-pairs within a document-pair are assumed to constitute a mixture of hidden topics; each word-pair follows a topic-specific bilingual translation model. Three BiTAM models are proposed to capture topic sharing at different levels of linguistic granularity (i.e., at the sentence or word levels). These models enable word-alignment process to leverage topical contents of document-pairs. Efficient variational approximation algorithms are designed for inference and parameter estimation. With the inferred latent topics, BiTAM models facilitate coherent pairing of bilingual linguistic entities that share common topical aspects. Our preliminary experiments show that the proposed models improve word alignment accuracy, and lead to better translation quality.