Generating semantic orientation lexicon using large data and thesaurus

  • Authors:
  • Amit Goyal;Hal Daumé, III

  • Affiliations:
  • University of Maryland, College Park, MD;University of Maryland, College Park, MD

  • Venue:
  • WASSA '11 Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a novel method to construct semantic orientation lexicons using large data and a thesaurus. To deal with large data, we use Count-Min sketch to store the approximate counts of all word pairs in a bounded space of 8GB. We use a thesaurus (like Roget) to constrain near-synonymous words to have the same polarity. This framework can easily scale to any language with a thesaurus and a unzipped corpus size ≥ 50 GB (12 billion to-kens). We evaluate these lexicons intrinsically and extrinsically, and they perform comparable when compared to other existing lexicons.