Discovering voter preferences in blogs using mixtures of topic models

Authors:
Pradipto Das;Rohini Srihari;Smruthi Mukund
Affiliations:
University at Buffalo, Buffalo, NY;University at Buffalo, Buffalo, NY;University at Buffalo, Buffalo, NY
Venue:
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Year:
2009

Citing 7
Cited 1

Hierarchical mixtures of experts and the EM algorithm

Neural Computation
Latent dirichlet allocation

The Journal of Machine Learning Research
The political blogosphere and the 2004 U.S. election: divided they blog

Proceedings of the 3rd international workshop on Link discovery
Topic sentiment mixture: modeling facets and opinions in weblogs

Proceedings of the 16th international conference on World Wide Web
Learning with compositional semantics as structural inference for subsentential sentiment analysis

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Predicting response to political blog posts with topic models

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Identifying expressions of opinion in context

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Textual predictors of bill survival in congressional committees

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose a new approach to capture the inclination towards a certain election candidate from the contents of blogs and to explain why that inclination may be so. The method is based on the availability of "ground truth" speeches from the election candidates that are labeled and also on the collection of noisy blogs which are not labeled in any way. In this unsupervised learning scenario, we used probabilistic topic models to cluster the ground truth documents for each candidate into different underlying latent themes. The same topic models were then applied on the blog collection and the "orientation" of each of the blogs with different themes of the election candidate speeches was performed using KL divergence of the topic distribution over the overlapping vocabularies. We used four models for such theme matching, one with a baseline topic model and the other three by weighting the baseline topic model with the positive, negative and the neutral sentiments of the topics. We then used a collaborative objective function to combine the outcome of candidate preference for the blogs under the four models using an Expectation Maximization algorithm. The novelty of our method is highlighted in its use of unannotated data as well as in the combination of the views of the different "experts" explaining the same phenomenon.