PTM: probabilistic topic mapping model for mining parallel document collections

Authors:
Duo Zhang;Jimeng Sun;ChengXiang Zhai;Abhijit Bose;Nikos Anerousis
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL, USA;IBM T.J. Watson Research Center, Watson, NY, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA;IBM T.J. Watson Research Centern, Watson, NY, USA;IBM T.J. Watson Research Centern, Watson, NY, USA
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 14
Cited 1

Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Latent dirichlet allocation

The Journal of Machine Learning Research
Answer models for question answering passage retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic author-topic models for information discovery

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic question answering using the web: Beyond the Factoid

Information Retrieval
Detection of question-answer pairs in email conversations

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Exploring social annotations for information retrieval

Proceedings of the 17th international conference on World Wide Web
Finding question-answer pairs from online forums

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval models for question and answer archives

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Joint latent topic models for text and citations

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering the tagged web

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Polylingual topic models

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2

Latent association analysis of document pairs

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.01

Visualization

Abstract

Many applications generate a large volume of parallel document collections. A parallel document collection consists of two sets of documents where the documents in each set correspond to each other and form semantic pairs (e.g., pairs of problem and solution descriptions in a help-desk setting). Although much work has been done on text mining, little previous work has attempted to mine such a novel kind of text data. In this paper, we propose a new probabilistic topic model, called Probabilistic Topic Mapping (PTM) model, to mine parallel document collections to simultaneously discover latent topics in both sets of documents as well as the mapping of topics in one set to those in the other. We evaluate the PTM model on one real parallel document collection in IT service domain. We show that PTM can effectively discover meaningful topics, as well as their mappings, and it's also useful for improving text matching and retrieval when there's a vocabulary gap.