Predicting the political sentiment of web log posts using supervised machine learning techniques coupled with feature selection

  • Authors:
  • Kathleen T. Durant;Michael D. Smith

  • Affiliations:
  • Harvard University, Harvard School of Engineering and Applied Sciences, Cambridge, MA;Harvard University, Harvard School of Engineering and Applied Sciences, Cambridge, MA

  • Venue:
  • WebKDD'06 Proceedings of the 8th Knowledge discovery on the web international conference on Advances in web mining and web usage analysis
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

As the number of web logs dramatically grows, readers are turning to them as an important source of information. Automatic techniques that identify the political sentiment of web log posts will help bloggers categorize and filter this exploding information source. In this paper we illustrate the effectiveness of supervised learning for sentiment classification on web log posts. We show that a Naïve Bayes classifier coupled with a forward feature selection technique can on average correctly predict a posting's sentiment 89.77% of the time with a standard deviation of 3.01. It significantly outperforms Support Vector Machines at the 95% confidence level with a confidence interval of [1.5, 2.7]. The feature selection technique provides on average an 11.84% and a 12.18% increase for Naïve Bayes and Support Vector Machines results respectively. Previous sentiment classification research achieved an 81% accuracy using Naïve Bayes and 82.9% using SVMs on a movie domain corpus.