Empirical study of machine learning based approach for opinion mining in tweets

  • Authors:
  • Grigori Sidorov;Sabino Miranda-Jiménez;Francisco Viveros-Jiménez;Alexander Gelbukh;Noé Castro-Sánchez;Francisco Velásquez;Ismael Díaz-Rangel;Sergio Suárez-Guerra;Alejandro Treviño;Juan Gordon

  • Affiliations:
  • Center for Computing Research, Instituto Politécnico Nacional, Mexico City, Mexico;Center for Computing Research, Instituto Politécnico Nacional, Mexico City, Mexico;Center for Computing Research, Instituto Politécnico Nacional, Mexico City, Mexico;Center for Computing Research, Instituto Politécnico Nacional, Mexico City, Mexico;Center for Computing Research, Instituto Politécnico Nacional, Mexico City, Mexico;Center for Computing Research, Instituto Politécnico Nacional, Mexico City, Mexico;Center for Computing Research, Instituto Politécnico Nacional, Mexico City, Mexico;Center for Computing Research, Instituto Politécnico Nacional, Mexico City, Mexico;Intellego SC, Mexico City, Mexico;Intellego SC, Mexico City, Mexico

  • Venue:
  • MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Opinion mining deals with determining of the sentiment orientation--positive, negative, or neutral--of a (short) text. Recently, it has attracted great interest both in academia and in industry due to its useful potential applications. One of the most promising applications is analysis of opinions in social networks. In this paper, we examine how classifiers work while doing opinion mining over Spanish Twitter data. We explore how different settings (n-gram size, corpus size, number of sentiment classes, balanced vs. unbalanced corpus, various domains) affect precision of the machine learning algorithms. We experimented with Naïve Bayes, Decision Tree, and Support Vector Machines. We describe also language specific preprocessing--in our case, for Spanish language--of tweets. The paper presents best settings of parameters for practical applications of opinion mining in Spanish Twitter. We also present a novel resource for analysis of emotions in texts: a dictionary marked with probabilities to express one of the six basic emotions(Probability Factor of Affective use (PFA)(Spanish Emotion Lexicon that contains 2,036 words.