Document-level sentiment classification: An empirical comparison between SVM and ANN

  • Authors:
  • Rodrigo Moraes;JoãO Francisco Valiati;Wilson P. GaviãO Neto

  • Affiliations:
  • Programa Interdisciplinar de Pós-Graduação em Computação Aplicada - PIPCA, Universidade do Vale do Rio dos Sinos - UNISINOS, Av. Unisinos, 950 São Leopoldo, RS, Brazi ...;Programa Interdisciplinar de Pós-Graduação em Computação Aplicada - PIPCA, Universidade do Vale do Rio dos Sinos - UNISINOS, Av. Unisinos, 950 São Leopoldo, RS, Brazi ...;Programa Interdisciplinar de Pós-Graduação em Computação Aplicada - PIPCA, Universidade do Vale do Rio dos Sinos - UNISINOS, Av. Unisinos, 950 São Leopoldo, RS, Brazi ...

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2013

Quantified Score

Hi-index 12.05

Visualization

Abstract

Document-level sentiment classification aims to automate the task of classifying a textual review, which is given on a single topic, as expressing a positive or negative sentiment. In general, supervised methods consist of two stages: (i) extraction/selection of informative features and (ii) classification of reviews by using learning models like Support Vector Machines (SVM) and Nai@?ve Bayes (NB). SVM have been extensively and successfully used as a sentiment learning approach while Artificial Neural Networks (ANN) have rarely been considered in comparative studies in the sentiment analysis literature. This paper presents an empirical comparison between SVM and ANN regarding document-level sentiment analysis. We discuss requirements, resulting models and contexts in which both approaches achieve better levels of classification accuracy. We adopt a standard evaluation context with popular supervised methods for feature selection and weighting in a traditional bag-of-words model. Except for some unbalanced data contexts, our experiments indicated that ANN produce superior or at least comparable results to SVM's. Specially on the benchmark dataset of Movies reviews, ANN outperformed SVM by a statistically significant difference, even on the context of unbalanced data. Our results have also confirmed some potential limitations of both models, which have been rarely discussed in the sentiment classification literature, like the computational cost of SVM at the running time and ANN at the training time.