Topic models and a revisit of text-related applications

  • Authors:
  • Viet Ha-Thuc;Padmini Srinivasan

  • Affiliations:
  • The University of Iowa, Iowa City, IA, USA;The University of Iowa, Iowa City, IA, USA

  • Venue:
  • Proceedings of the 2nd PhD workshop on Information and knowledge management
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Topic models such as aspect model or LDA have been shown as a promising approach for text modeling. Unlike many previous models that restrict each document to a single topic, topic models support the important idea that each document could be relevant to multiple topics. This makes topic models significantly more expressive in modeling text documents. However, we observe two limitations in topic models. One is that of scalability as it is extremely expensive to run the models on large corpora. The other limitation is the inability to model the key concept of relevance. This prevents the models from being directly applied to goals such as text classification and relevance feedback for query modification; in these goals, items relevant to topics (classes and queries) are provided upfront. The first aim of this paper is to sketch solutions for these limitations. To alleviate the scalability problem, we introduce a one-scan topic model requiring only a single pass over a corpus for inference. To overcome the latter, we propose relevance-based topic models that have the advantages of previous models while taking the concept of relevance into account. The second aim, based on the proposed models, is to revisit a wide range of well-known but still open text-related tasks, and outline our vision on how the approaches for the tasks could be improved by topic models.