Generating semantic orientation lexicon using large data and thesaurus

Authors:
Amit Goyal;Hal Daumé, III
Affiliations:
University of Maryland, College Park, MD;University of Maryland, College Park, MD
Venue:
WASSA '11 Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis
Year:
2011

Citing 17
Cited 0

New directions in traffic measurement and accounting

Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Mining product reputations on the Web

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Measuring praise and criticism: Inference of semantic orientation from association

ACM Transactions on Information Systems (TOIS)
Sentiment analysis: capturing favorability using natural language processing

Proceedings of the 2nd international conference on Knowledge capture
An improved data stream summary: the count-min sketch and its applications

Journal of Algorithms
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Recognizing contextual polarity in phrase-level sentiment analysis

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
OpinionFinder: a system for subjectivity analysis

HLT-Demo '05 Proceedings of HLT/EMNLP on Interactive Demonstrations
Opinion Mining and Sentiment Analysis

Foundations and Trends in Information Retrieval
Semi-supervised polarity lexicon induction

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
A study on similarity and relatedness using distributional and WordNet-based approaches

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Web-scale distributional similarity and entity set expansion

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
The viability of web-derived polarity lexicons

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Sketching techniques for large scale NLP

WAC-6 '10 Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop
Automatically producing plot unit representations for narrative text

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel method to construct semantic orientation lexicons using large data and a thesaurus. To deal with large data, we use Count-Min sketch to store the approximate counts of all word pairs in a bounded space of 8GB. We use a thesaurus (like Roget) to constrain near-synonymous words to have the same polarity. This framework can easily scale to any language with a thesaurus and a unzipped corpus size ≥ 50 GB (12 billion to-kens). We evaluate these lexicons intrinsically and extrinsically, and they perform comparable when compared to other existing lexicons.