Using sequence kernels to identify opinion entities in Urdu

Authors:
Smruthi Mukund;Debanjan Ghosh;Rohini K. Srihari
Affiliations:
SUNY at Buffalo, NY;Thomson Reuters Corporate R&D;SUNY at Buffalo, NY
Venue:
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Year:
2011

Citing 15
Cited 1

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Text classification using string kernels

The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
The Berkeley FrameNet Project

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Learning subjective nouns using extraction pattern bootstrapping

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Estimation of Dependences Based on Empirical Data: Empirical Inference Science (Information Science and Statistics)

Estimation of Dependences Based on Empirical Data: Empirical Inference Science (Information Science and Statistics)
Dependency tree kernels for relation extraction

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Identifying sources of opinions with conditional random fields and extraction patterns

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A shortest path dependency kernel for relation extraction

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Infoxtract: A customizable intermediate level information extraction engine

Natural Language Engineering
Tree kernels for semantic role labeling

Computational Linguistics
Extracting opinions, opinion holders, and topics expressed in online news media text

SST '06 Proceedings of the Workshop on Sentiment and Subjectivity in Text
Convolution kernels for opinion holder extraction

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A vector space model for subjectivity classification in Urdu aided by co-training

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

Analyzing Urdu social media for sentiments using transfer learning with controlled translations

LSM '12 Proceedings of the Second Workshop on Language in Social Media

Quantified Score

Hi-index	0.01

Visualization

Abstract

Automatic extraction of opinion holders and targets (together referred to as opinion entities) is an important subtask of sentiment analysis. In this work, we attempt to accurately extract opinion entities from Urdu newswire. Due to the lack of resources required for training role labelers and dependency parsers (as in English) for Urdu, a more robust approach based on (i) generating candidate word sequences corresponding to opinion entities, and (ii) subsequently disambiguating these sequences as opinion holders or targets is presented. Detecting the boundaries of such candidate sequences in Urdu is very different than in English since in Urdu, grammatical categories such as tense, gender and case are captured in word inflections. In this work, we exploit the morphological inflections associated with nouns and verbs to correctly identify sequence boundaries. Different levels of information that capture context are encoded to train standard linear and sequence kernels. To this end the best performance obtained for opinion entity detection for Urdu sentiment analysis is 58.06% F-Score using sequence kernels and 61.55% F-Score using a combination of sequence and linear kernels.