Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews

  • Authors:
  • Hanhoon Kang;Seong Joon Yoo;Dongil Han

  • Affiliations:
  • Dept. of Computer Engineering, Sejong University, 98 Gunja, Gwangjin, Seoul 143-747, Republic of Korea;Dept. of Computer Engineering, Sejong University, 98 Gunja, Gwangjin, Seoul 143-747, Republic of Korea;Dept. of Computer Engineering, Sejong University, 98 Gunja, Gwangjin, Seoul 143-747, Republic of Korea

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 12.05

Visualization

Abstract

The existing senti-lexicon does not sufficiently accommodate the sentiment word that is used in the restaurant review. Therefore, this thesis proposes a new senti-lexicon for the sentiment analysis of restaurant reviews. When classifying a review document as a positive sentiment and as a negative sentiment using the supervised learning algorithm, there is a tendency for the positive classification accuracy to appear up to approximately 10% higher than the negative classification accuracy. This creates a problem of decreasing the average accuracy when the accuracies of the two classes are expressed as an average value. In order to mitigate such problem, an improved Naive Bayes algorithm is proposed. The result of the experiment showed that when this algorithm was used and a unigrams+bigrams was used as the feature, the gap between the positive accuracy and the negative accuracy was narrowed to 3.6% compared to when the original Naive Bayes was used, and that the 28.5% gap was able to be narrowed compared to when SVM was used. Additionally, the use of this algorithm based on the senti-lexicon showed an accuracy that improved by a maximum of 10.2% in recall and a maximum of 26.2% in precision compared to when SVM was used, and by a maximum of 5.6% in recall and a maximum of 1.9% in precision compared to when Naive Bayes was used.