Identification, classification, and analysis of opinions on the web

  • Authors:
  • Eduard Hovy;Soo Min Kim

  • Affiliations:
  • University of Southern California;University of Southern California

  • Venue:
  • Identification, classification, and analysis of opinions on the web
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sophisticated language processing in recent years has made it possible to take on increasingly complex text analysis challenges. One such challenge is recognizing, classifying, and understanding opinionated text. This ability is desirable for various tasks, including filtering advertisements, separating the arguments in online debate or discussions, and ranking web documents cited as authorities on contentious topics. However, discussion and analysis of opinionated text is very difficult due to a general inability to define what exactly an opinion is. In this work, we introduce a methodology for defining and analyzing opinions. We decompose the task of opinion analysis into four parts: (1) recognizing the opinions; (2) identifying their valence; (3) identifying their holder and topic; and (4) identifying their reason. For more detailed analysis, we define two kinds of opinions: (1) Judgment opinions about the world, with emotive values such as good, bad, neutral, wise, foolish, virtuous, etc; and (2) Belief opinions (specifically, about the future, which we call Predictive opinions), with epistemic values such as likely, unlikely, possible, uncertain, etc. In this study, we introduce a method for detecting Judgment opinions using a technique that leverages a small set of seed words and a WordNet expansion algorithm. We propose an algorithm for Predictive opinions, as a subclass of Belief opinions, using a feature generalization technique. This algorithm sets the foundation for exploring a whole new category of opinion detection for Belief opinions in general. For opinion holder and topic identification, we develop two methods. The first method utilizes syntactic features of an opinion, holder, and topic in a sentence, while the second method applies semantic frames of opinion words as an intermediate step of opinion holder and topic identification. We also introduce a model of opinion reason identification. We first present a novel technique that automatically labels a large corpus for this task, and then investigate our approach using lexical, structural, and semantic features. The main contribution of this work is to introduce a framework to analyze opinions at a deeper level with core semantic elements, valence, holder, topic, and reason, and to use this framework for a real world application. We develop algorithms and resources and apply them across several opinion domains, including news media texts, product reviews, consumer complaints, citizens' emails, and political discussions.