Using canonical correlation analysis for generalized sentiment analysis, product recommendation and search

  • Authors:
  • Siamak Faridani

  • Affiliations:
  • UC Berkeley, Berkeley, CA, USA

  • Venue:
  • Proceedings of the fifth ACM conference on Recommender systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Standard Sentiment Analysis applies Natural Language Processing methods to assess an "approval" value of a given text, categorizing it into "negative", "neutral", or "positive" or on a linear scale. Sentiment Analysis can be used to infer ratings values for users based on textual reviews of items such as books, films, or products. We propose an approach to generalizing the concept to multiple dimensions to estimate user ratings along multiple axes such as "service", "price" and "value". We use Canonical Correlation Analysis (CCA) and derive a mathematical model that can be used as a multivariate regression tool. This model has a number of valuable properties: it can be trained offline and used efficiently on live stream of texts like blogs and tweets, can be used for visualization and data clustering and labeling, and finally it can potentially be incorporated into natural language product search algorithms. At the end we propose an evaluation procedure that can be used on live data when a ground truth is not available. Based on this model we present our preliminary results from empirical data that we have collected from our system Opinion Space1. We show that for this dataset the CCA model outperforms the PCA that was originally used in Opinion Space.