Locally discriminative topic modeling

  • Authors:
  • Hao Wu;Jiajun Bu;Chun Chen;Jianke Zhu;Lijun Zhang;Haifeng Liu;Can Wang;Deng Cai

  • Affiliations:
  • College of Computer Science, Zhejiang University, Hangzhou 310027, China;College of Computer Science, Zhejiang University, Hangzhou 310027, China;College of Computer Science, Zhejiang University, Hangzhou 310027, China;College of Computer Science, Zhejiang University, Hangzhou 310027, China;College of Computer Science, Zhejiang University, Hangzhou 310027, China;College of Computer Science, Zhejiang University, Hangzhou 310027, China;College of Computer Science, Zhejiang University, Hangzhou 310027, China;College of Computer Science, Zhejiang University, Hangzhou 310027, China

  • Venue:
  • Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.01

Visualization

Abstract

Topic modeling is a powerful tool for discovering the underlying or hidden structure in text corpora. Typical algorithms for topic modeling include probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA). Despite their different inspirations, both approaches are instances of generative model, whereas the discriminative structure of the documents is ignored. In this paper, we propose locally discriminative topic model (LDTM), a novel topic modeling approach which considers both generative and discriminative structures of the data space. Different from PLSA and LDA in which the topic distribution of a document is dependent on all the other documents, LDTM takes a local perspective that the topic distribution of each document is strongly dependent on its neighbors. By modeling the local relationships of documents within each neighborhood via a local linear model, we learn topic distributions that vary smoothly along the geodesics of the data manifold, and can better capture the discriminative structure in the data. The experimental results on text clustering and web page categorization demonstrate the effectiveness of our proposed approach.