Translingual document representations from discriminative projections

Authors:
John C. Platt;Kristina Toutanova;Wen-tau Yih
Affiliations:
Microsoft Research, Redmond, WA;Microsoft Research, Redmond, WA;Microsoft Research, Redmond, WA
Venue:
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Year:
2010

Citing 14
Cited 17

Principal component neural networks: theory and applications

Principal component neural networks: theory and applications
Dictionary Methods for Cross-Lingual Information Retrieval

DEXA '96 Proceedings of the 7th International Conference on Database and Expert Systems Applications
Latent dirichlet allocation

The Journal of Machine Learning Research
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora

Computational Linguistics
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Using word dependent transition models in HMM based word alignment for statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
NRC's PORTAGE system for WMT 2007

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Polylingual topic models

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
On smoothing and inference for topic models

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Cross-lingual latent topic extraction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Extracting multilingual topics from unaligned comparable corpora

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval

Rich prior knowledge in learning for NLP

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts of ACL 2011
From bilingual dictionaries to interlingual document representations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Clickthrough-based latent semantic models for web search

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
No free lunch: brute force vs. locality-sensitive hashing for cross-lingual pairwise similarity

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Learning discriminative projections for text similarity measures

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
A cross-domain adaptation method for sentiment classification using probabilistic latent analysis

Proceedings of the 20th ACM international conference on Information and knowledge management
Improving bilingual projections via sparse covariance matrices

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Cross-language information retrieval with latent topic models trained on a comparable corpus

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Learning hash functions for cross-view similarity search

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Polarity inducing latent semantic analysis

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Question-answer topic model for question retrieval in community question answering

Proceedings of the 21st ACM international conference on Information and knowledge management
Cross-Language high similarity search using a conceptual thesaurus

CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
Bidirectional semi-supervised learning with graphs

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Efficient Nearest-Neighbor Search in the Probability Simplex

Proceedings of the 2013 Conference on the Theory of Information Retrieval
Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora

Information Retrieval
Learning deep structured semantic models for web search using clickthrough data

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Joint and coupled bilingual topic model based sentence representations for language model adaptation

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Representing documents by vectors that are independent of language enhances machine translation and multilingual text categorization. We use discriminative training to create a projection of documents from multiple languages into a single translingual vector space. We explore two variants to create these projections: Oriented Principal Component Analysis (OPCA) and Coupled Probabilistic Latent Semantic Analysis (CPLSA). Both of these variants start with a basic model of documents (PCA and PLSA). Each model is then made discriminative by encouraging comparable document pairs to have similar vector representations. We evaluate these algorithms on two tasks: parallel document retrieval for Wikipedia and Europarl documents, and cross-lingual text classification on Reuters. The two discriminative variants, OPCA and CPLSA, significantly outperform their corresponding baselines. The largest differences in performance are observed on the task of retrieval when the documents are only comparable and not parallel. The OPCA method is shown to perform best.