Multilingual Sentiment Analysis Using Latent Semantic Indexing and Machine Learning

  • Authors:
  • Brett W. Bader;W. Philip Kegelmeyer;Peter A. Chew

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDMW '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a novel approach to predicting the sentiment of documents in multiple languages, without translation. The only prerequisite is a multilingual parallel corpus wherein a training sample of the documents, in a single language only, have been tagged with their overall sentiment. Latent Semantic Indexing (LSI) converts that multilingual corpus into a multilingual ``concept space''. Both training and test documents can be projected into that space, allowing cross-lingual semantic comparisons between the documents without the need for translation. Accordingly, the training documents with known sentiment are used to build a machine learning model which can, because of the multilingual nature of the document projections, be used to predict sentiment in the other languages. We explain and evaluate the accuracy of this approach. We also design and conduct experiments to investigate the extent to which topic and sentiment {\em separately} contribute to that classification accuracy, and thereby shed some initial light on the question of whether topic and sentiment can be sensibly teased apart.