Analyzing Urdu social media for sentiments using transfer learning with controlled translations

  • Authors:
  • Smruthi Mukund;Rohini K. Srihari

  • Affiliations:
  • University at Buffalo, SUNY, Buffalo, NY;University at Buffalo, SUNY, Buffalo, NY

  • Venue:
  • LSM '12 Proceedings of the Second Workshop on Language in Social Media
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The main aim of this work is to perform sentiment analysis on Urdu blog data. We use the method of structural correspondence learning (SCL) to transfer sentiment analysis learning from Urdu newswire data to Urdu blog data. The pivots needed to transfer learning from newswire domain to blog domain is not trivial as Urdu blog data, unlike newswire data is written in Latin script and exhibits code-mixing and code-switching behavior. We consider two oracles to generate the pivots. 1. Transliteration oracle, to accommodate script variation and spelling variation and 2. Translation oracle, to accommodate code-switching and code-mixing behavior. In order to identify strong candidates for translation, we propose a novel part-of-speech tagging method that helps select words based on POS categories that strongly reflect code-mixing behavior. We validate our approach against a supervised learning method and show that the performance of our proposed approach is comparable.