New directions in traffic measurement and accounting
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Mining product reputations on the Web
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Measuring praise and criticism: Inference of semantic orientation from association
ACM Transactions on Information Systems (TOIS)
Sentiment analysis: capturing favorability using natural language processing
Proceedings of the 2nd international conference on Knowledge capture
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Thumbs up?: sentiment classification using machine learning techniques
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Recognizing contextual polarity in phrase-level sentiment analysis
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
OpinionFinder: a system for subjectivity analysis
HLT-Demo '05 Proceedings of HLT/EMNLP on Interactive Demonstrations
Opinion Mining and Sentiment Analysis
Foundations and Trends in Information Retrieval
Semi-supervised polarity lexicon induction
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
A study on similarity and relatedness using distributional and WordNet-based approaches
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Web-scale distributional similarity and entity set expansion
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
The viability of web-derived polarity lexicons
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Sketching techniques for large scale NLP
WAC-6 '10 Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop
Automatically producing plot unit representations for narrative text
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
We propose a novel method to construct semantic orientation lexicons using large data and a thesaurus. To deal with large data, we use Count-Min sketch to store the approximate counts of all word pairs in a bounded space of 8GB. We use a thesaurus (like Roget) to constrain near-synonymous words to have the same polarity. This framework can easily scale to any language with a thesaurus and a unzipped corpus size ≥ 50 GB (12 billion to-kens). We evaluate these lexicons intrinsically and extrinsically, and they perform comparable when compared to other existing lexicons.