πLDA: document clustering with selective structural constraints

Authors:
Siliang Tang;Hanqi Wang;Jian Shao;Fei Wu;Ming Chen;Yueting Zhuang
Affiliations:
Zhejiang University, Hangzhou, China;Zhejiang University, Hangzhou, China;Zhejiang University, Hangzhou, China;Zhejiang University, Hangzhou, China;Zhejiang University, Hangzhou, China;Zhejiang University, Hangzhou, China
Venue:
Proceedings of the 21st ACM international conference on Multimedia
Year:
2013

Citing 5
Cited 0

Latent dirichlet allocation

The Journal of Machine Learning Research
Latent Dirichlet Co-Clustering

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
LabelMe: A Database and Web-Based Tool for Image Annotation

International Journal of Computer Vision
Multi-document summarization using sentence-based topic models

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Supervised cross-collection topic modeling

Proceedings of the 20th ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Segments, such as sentence boundaries in texts or annotated regions in images, can be considered as useful structural constraints (i.e., priors) for unsupervised topic modeling. However, some segment units (e.g., words in texts or visual words in images) inside a given segment may be irrelevant to the topic of this segment due to their characteristics. This paper proposes a model called πLDA, which introduces a latent variable π into LDA, a traditional topic model, to capture the characteristic of each segment unit. That is to say, the πLDA model is conducted to determine whether a segment unit is assigned (or selected) to the topic embedded in its corresponding segment. Compared with other approaches that assume all the segment units in one segment to share a common topic, our proposed πLDA has the selective ability to discover the discriminative segment units (e.g., informative words or visual words). Experimental results and interpretations of them are presented for demonstrating the promising performance of our method.