Discovering voter preferences in blogs using mixtures of topic models

  • Authors:
  • Pradipto Das;Rohini Srihari;Smruthi Mukund

  • Affiliations:
  • University at Buffalo, Buffalo, NY;University at Buffalo, Buffalo, NY;University at Buffalo, Buffalo, NY

  • Venue:
  • Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose a new approach to capture the inclination towards a certain election candidate from the contents of blogs and to explain why that inclination may be so. The method is based on the availability of "ground truth" speeches from the election candidates that are labeled and also on the collection of noisy blogs which are not labeled in any way. In this unsupervised learning scenario, we used probabilistic topic models to cluster the ground truth documents for each candidate into different underlying latent themes. The same topic models were then applied on the blog collection and the "orientation" of each of the blogs with different themes of the election candidate speeches was performed using KL divergence of the topic distribution over the overlapping vocabularies. We used four models for such theme matching, one with a baseline topic model and the other three by weighting the baseline topic model with the positive, negative and the neutral sentiments of the topics. We then used a collaborative objective function to combine the outcome of candidate preference for the blogs under the four models using an Expectation Maximization algorithm. The novelty of our method is highlighted in its use of unannotated data as well as in the combination of the views of the different "experts" explaining the same phenomenon.