Supervised cross-collection topic modeling

Authors:
Haidong Gao;Siliang Tang;Yin Zhang;Dapeng Jiang;Fei Wu;Yueting Zhuang
Affiliations:
College of Computer Science, Zhejiang University, Hangzhou, Zhejiang, China;College of Computer Science, Zhejiang University, Hangzhou, Zhejiang, China;College of Computer Science, Zhejiang University, Hangzhou, Zhejiang, China;College of Computer Science, Zhejiang University, Hangzhou, Zhejiang, China;College of Computer Science, Zhejiang University, Hangzhou, Zhejiang, China;College of Computer Science, Zhejiang University, Hangzhou, Zhejiang, China
Venue:
Proceedings of the 20th ACM international conference on Multimedia
Year:
2012

Citing 5
Cited 1

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
A cross-collection mixture model for comparative text mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Data-driven visual similarity for cross-domain image matching

Proceedings of the 2011 SIGGRAPH Asia Conference

πLDA: document clustering with selective structural constraints

Proceedings of the 21st ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays, vast amounts of multimedia data can be obtained across different collections (or domains). Therefore, it poses significant challenges for the utilization of those cross-collection data, for examples, the summarization of similarities and differences of data across different domains (e.g., CNN and NYT), as well as finding visually similar images across different visual domains (e.g., photos, paintings and hand-drawn sketches). In this paper, a supervised cross-collection Latent Dirichlet Allocation (scLDA) approach is proposed to utilize the data across different collections. As a natural extension of traditional Latent Dirichlet Allocation (LDA), scLDA not only takes the structural priors of different collections into consideration, but also exploits the category information. The strength of this work lies in integrating topic modeling, cross-domain learning and supervised learning together. We conduct scLDA for comparative text mining as well as classification of news articles and images from different collections. The results suggest that our proposed scLDA can generate meaningful collection-specific topics and achieves better retrieval accuracy than other related topic models.