Lexicon based sentiment analysis of Urdu text using SentiUnits

  • Authors:
  • Afraz Z. Syed;Muhammad Aslam;Ana Maria Martinez-Enriquez

  • Affiliations:
  • Department of CS & E, U.E.T., Lahore, Pakistan;Department of CS & E, U.E.T., Lahore, Pakistan;Department of CS, CINVESTAV, IPN, D.F. Mexico

  • Venue:
  • MICAI'10 Proceedings of the 9th Mexican international conference on Advances in artificial intelligence: Part I
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Like other languages, Urdu websites are becoming more popular, because the people prefer to share opinions and express sentiments in their own language. Sentiment analyzers developed for other well-studied languages, like English, are not workable for Urdu, due to their scriptic, morphological, and grammatical differences. As a result, this language should be studied as an independent problem domain. Our approach towards sentiment analysis is based on the identification and extraction of SentiUnits from the given text, using shallow parsing. SentiUnits are the expressions, which contain the sentiment information in a sentence. We use sentiment-annotated lexicon based approach. Unluckily, for Urdu language no such lexicon exists. So, a major part of this research consists in developing such a lexicon. Hence, this paper is presented as a base line for this colossal and complex task. Our goal is to highlight the linguistic (grammar and morphology) as well as technical aspects of this multidimensional research problem. The performance of the system is evaluated on multiple texts and the achieved results are quite satisfactory.