Using sequence kernels to identify opinion entities in Urdu

  • Authors:
  • Smruthi Mukund;Debanjan Ghosh;Rohini K. Srihari

  • Affiliations:
  • SUNY at Buffalo, NY;Thomson Reuters Corporate R&D;SUNY at Buffalo, NY

  • Venue:
  • CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

Automatic extraction of opinion holders and targets (together referred to as opinion entities) is an important subtask of sentiment analysis. In this work, we attempt to accurately extract opinion entities from Urdu newswire. Due to the lack of resources required for training role labelers and dependency parsers (as in English) for Urdu, a more robust approach based on (i) generating candidate word sequences corresponding to opinion entities, and (ii) subsequently disambiguating these sequences as opinion holders or targets is presented. Detecting the boundaries of such candidate sequences in Urdu is very different than in English since in Urdu, grammatical categories such as tense, gender and case are captured in word inflections. In this work, we exploit the morphological inflections associated with nouns and verbs to correctly identify sequence boundaries. Different levels of information that capture context are encoded to train standard linear and sequence kernels. To this end the best performance obtained for opinion entity detection for Urdu sentiment analysis is 58.06% F-Score using sequence kernels and 61.55% F-Score using a combination of sequence and linear kernels.