Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
The Journal of Machine Learning Research
A pattern matching method for finding noun and proper noun translations from noisy parallel corpora
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
A bootstrapping method for extracting bilingual text pairs
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Probabilistic author-topic models for information discovery
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Lexical triggers and latent semantic analysis for cross-lingual language model adaptation
ACM Transactions on Asian Language Information Processing (TALIP)
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
ICML '06 Proceedings of the 23rd international conference on Machine learning
Pachinko allocation: DAG-structured mixture models of topic correlations
ICML '06 Proceedings of the 23rd international conference on Machine learning
A mixture model for contextual text mining
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Exploiting comparable corpora and bilingual dictionaries for cross-language text categorization
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Topic sentiment mixture: modeling facets and opinions in weblogs
Proceedings of the 16th international conference on World Wide Web
BiTAM: bilingual topic AdMixture models for word alignment
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Mining correlated bursty topic patterns from coordinated text streams
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Topic modeling with network regularization
Proceedings of the 17th international conference on World Wide Web
A general optimization framework for smoothing language models on graph structures
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Mining multilingual topics from wikipedia
Proceedings of the 18th international conference on World wide web
The cluster-abstraction model: unsupervised learning of topic hierarchies from text data
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Multilingual topic models for unaligned text
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Probabilistic latent semantic analysis
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Extracting multilingual topics from unaligned comparable corpora
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Translingual document representations from discriminative projections
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Joint bilingual sentiment classification with unlabeled parallel corpora
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Automatic annotation of bibliographical references for descriptive language materials
CLEF'11 Proceedings of the Second international conference on Multilingual and multimodal information access evaluation
Improving bilingual projections via sparse covariance matrices
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Cross lingual semantic search by improving semantic similarity and relatedness measures
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part II
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Modeling click-through based word-pairs for web search
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Discovering coherent topics using general knowledge
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Probabilistic latent topic models have recently enjoyed much success in extracting and analyzing latent topics in text in an unsupervised way. One common deficiency of existing topic models, though, is that they would not work well for extracting cross-lingual latent topics simply because words in different languages generally do not co-occur with each other. In this paper, we propose a way to incorporate a bilingual dictionary into a probabilistic topic model so that we can apply topic models to extract shared latent topics in text data of different languages. Specifically, we propose a new topic model called Probabilistic Cross-Lingual Latent Semantic Analysis (PCLSA) which extends the Probabilistic Latent Semantic Analysis (PLSA) model by regularizing its likelihood function with soft constraints defined based on a bilingual dictionary. Both qualitative and quantitative experimental results show that the PCLSA model can effectively extract cross-lingual latent topics from multilingual text data.