Mining co-occurrence matrices for SO-PMI paradigm word candidates

Authors:
Aleksander Wawer
Affiliations:
Polish Academy of Science, Warszawa, Poland
Venue:
EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Year:
2012

Citing 5
Cited 0

Measuring praise and criticism: Inference of semantic orientation from association

ACM Transactions on Information Systems (TOIS)
Discovering global patterns in linguistic networks through spectral analysis: a case study of the consonant inventories

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Modifying SO-PMI for Japanese Weblog Opinion Mining by using a balancing factor and detecting neutral expressions

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Weakly supervised techniques for domain-independent sentiment classification

Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion
From frequency to meaning: vector space models of semantics

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper is focused on one aspect of SO-PMI, an unsupervised approach to sentiment vocabulary acquisition proposed by Turney (Turney and Littman, 2003). The method, originally applied and evaluated for English, is often used in bootstrapping sentiment lexicons for European languages where no such resources typically exist. In general, SO-PMI values are computed from word co-occurrence frequencies in the neighbourhoods of two small sets of paradigm words. The goal of this work is to investigate how lexeme selection affects the quality of obtained sentiment estimations. This has been achieved by comparing ad hoc random lexeme selection with two alternative heuristics, based on clustering and SVD decomposition of a word co-occurrence matrix, demonstrating superiority of the latter methods. The work can be also interpreted as sensitivity analysis on SO-PMI with regard to paradigm word selection. The experiments were carried out for Polish.