Analyzing Urdu social media for sentiments using transfer learning with controlled translations

Authors:
Smruthi Mukund;Rohini K. Srihari
Affiliations:
University at Buffalo, SUNY, Buffalo, NY;University at Buffalo, SUNY, Buffalo, NY
Venue:
LSM '12 Proceedings of the Second Workshop on Language in Social Media
Year:
2012

Citing 12
Cited 0

Automatic labeling of semantic roles

Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Supervised and unsupervised PCFG adaptation to novel domains

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

The Journal of Machine Learning Research
Domain adaptation with structural correspondence learning

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Domain adaptation for statistical classifiers

Journal of Artificial Intelligence Research
Analysis and development of Urdu POS tagged corpus

ALR7 Proceedings of the 7th Workshop on Asian Language Resources
Cross-language text classification using structural correspondence learning

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A vector space model for subjectivity classification in Urdu aided by co-training

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Subjectivity and sentiment analysis of modern standard Arabic

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Using sequence kernels to identify opinion entities in Urdu

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The main aim of this work is to perform sentiment analysis on Urdu blog data. We use the method of structural correspondence learning (SCL) to transfer sentiment analysis learning from Urdu newswire data to Urdu blog data. The pivots needed to transfer learning from newswire domain to blog domain is not trivial as Urdu blog data, unlike newswire data is written in Latin script and exhibits code-mixing and code-switching behavior. We consider two oracles to generate the pivots. 1. Transliteration oracle, to accommodate script variation and spelling variation and 2. Translation oracle, to accommodate code-switching and code-mixing behavior. In order to identify strong candidates for translation, we propose a novel part-of-speech tagging method that helps select words based on POS categories that strongly reflect code-mixing behavior. We validate our approach against a supervised learning method and show that the performance of our proposed approach is comparable.