Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
Document language models, query models, and risk minimization for information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval
Proceedings of the tenth international conference on Information and knowledge management
The Journal of Machine Learning Research
Answer models for question answering passage retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic author-topic models for information discovery
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic question answering using the web: Beyond the Factoid
Information Retrieval
Detection of question-answer pairs in email conversations
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Exploring social annotations for information retrieval
Proceedings of the 17th international conference on World Wide Web
Finding question-answer pairs from online forums
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval models for question and answer archives
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Joint latent topic models for text and citations
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the Second ACM International Conference on Web Search and Data Mining
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Latent association analysis of document pairs
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.01 |
Many applications generate a large volume of parallel document collections. A parallel document collection consists of two sets of documents where the documents in each set correspond to each other and form semantic pairs (e.g., pairs of problem and solution descriptions in a help-desk setting). Although much work has been done on text mining, little previous work has attempted to mine such a novel kind of text data. In this paper, we propose a new probabilistic topic model, called Probabilistic Topic Mapping (PTM) model, to mine parallel document collections to simultaneously discover latent topics in both sets of documents as well as the mapping of topics in one set to those in the other. We evaluate the PTM model on one real parallel document collection in IT service domain. We show that PTM can effectively discover meaningful topics, as well as their mappings, and it's also useful for improving text matching and retrieval when there's a vocabulary gap.