Corpus-based stemming using cooccurrence of word variants
ACM Transactions on Information Systems (TOIS)
Analyzing the effectiveness and applicability of co-training
Proceedings of the ninth international conference on Information and knowledge management
Relevance based language models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Automating the Construction of Internet Portals with Machine Learning
Information Retrieval
The Journal of Machine Learning Research
Parsimonious language models for information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based retrieval using language models
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic models for discovering e-communities
Proceedings of the 15th international conference on World Wide Web
LDA-based document models for ad-hoc retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Topic evolution and social interactions: how authors effect research
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Term feedback for information retrieval with language models
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic analysis
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Stemming via Distribution-Based Word Segregation for Classification and Retrieval
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A relevance-based topic model for news event tracking
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Challenging research issues in data mining, databases and information retrieval
ACM SIGKDD Explorations Newsletter
Online conversation mining for author characterization and topic identification
Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
Hi-index | 0.00 |
Topic models such as aspect model or LDA have been shown as a promising approach for text modeling. Unlike many previous models that restrict each document to a single topic, topic models support the important idea that each document could be relevant to multiple topics. This makes topic models significantly more expressive in modeling text documents. However, we observe two limitations in topic models. One is that of scalability as it is extremely expensive to run the models on large corpora. The other limitation is the inability to model the key concept of relevance. This prevents the models from being directly applied to goals such as text classification and relevance feedback for query modification; in these goals, items relevant to topics (classes and queries) are provided upfront. The first aim of this paper is to sketch solutions for these limitations. To alleviate the scalability problem, we introduce a one-scan topic model requiring only a single pass over a corpus for inference. To overcome the latter, we propose relevance-based topic models that have the advantages of previous models while taking the concept of relevance into account. The second aim, based on the proposed models, is to revisit a wide range of well-known but still open text-related tasks, and outline our vision on how the approaches for the tasks could be improved by topic models.