The Journal of Machine Learning Research
An IR approach for translating new words from nonparallel, comparable texts
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Identifying word translations in non-parallel texts
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Extraction of lexical translations from non-aligned corpora
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Lexical triggers and latent semantic analysis for cross-lingual language model adaptation
ACM Transactions on Asian Language Information Processing (TALIP)
A Bayesian Hierarchical Model for Learning Natural Scene Categories
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Learning a translation lexicon from monolingual corpora
ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Topic modeling: beyond bag-of-words
ICML '06 Proceedings of the 23rd international conference on Machine learning
LDA-based document models for ad-hoc retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
BiTAM: bilingual topic AdMixture models for word alignment
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Mining multilingual topics from wikipedia
Proceedings of the 18th international conference on World wide web
Cross-lingual latent topic extraction
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Holistic sentiment analysis across languages: multilingual supervised latent Dirichlet allocation
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A multi-view approach for term translation spotting
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
From bilingual dictionaries to interlingual document representations
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Identifying word translations from comparable corpora using latent topic models
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Extracting multilingual topics from unaligned comparable corpora
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Cross-language information retrieval with latent topic models trained on a comparable corpus
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Latent association analysis of document pairs
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
DCU-symantec submission for the WMT 2012 quality estimation task
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Modeling click-through based word-pairs for web search
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
We develop the multilingual topic model for unaligned text (MuTo), a probabilistic model of text that is designed to analyze corpora composed of documents in two languages. From these documents, MuTo uses stochastic EM to simultaneously discover both a matching between the languages and multilingual latent topics. We demonstrate that MuTo is able to find shared topics on real-world multilingual corpora, successfully pairing related documents across languages. MuTo provides a new framework for creating multilingual topic models without needing carefully curated parallel corpora and allows applications built using the topic model formalism to be applied to a much wider class of corpora.