Supervised cross-collection topic modeling

  • Authors:
  • Haidong Gao;Siliang Tang;Yin Zhang;Dapeng Jiang;Fei Wu;Yueting Zhuang

  • Affiliations:
  • College of Computer Science, Zhejiang University, Hangzhou, Zhejiang, China;College of Computer Science, Zhejiang University, Hangzhou, Zhejiang, China;College of Computer Science, Zhejiang University, Hangzhou, Zhejiang, China;College of Computer Science, Zhejiang University, Hangzhou, Zhejiang, China;College of Computer Science, Zhejiang University, Hangzhou, Zhejiang, China;College of Computer Science, Zhejiang University, Hangzhou, Zhejiang, China

  • Venue:
  • Proceedings of the 20th ACM international conference on Multimedia
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Nowadays, vast amounts of multimedia data can be obtained across different collections (or domains). Therefore, it poses significant challenges for the utilization of those cross-collection data, for examples, the summarization of similarities and differences of data across different domains (e.g., CNN and NYT), as well as finding visually similar images across different visual domains (e.g., photos, paintings and hand-drawn sketches). In this paper, a supervised cross-collection Latent Dirichlet Allocation (scLDA) approach is proposed to utilize the data across different collections. As a natural extension of traditional Latent Dirichlet Allocation (LDA), scLDA not only takes the structural priors of different collections into consideration, but also exploits the category information. The strength of this work lies in integrating topic modeling, cross-domain learning and supervised learning together. We conduct scLDA for comparative text mining as well as classification of news articles and images from different collections. The results suggest that our proposed scLDA can generate meaningful collection-specific topics and achieves better retrieval accuracy than other related topic models.