Automatic labeling of semantic roles
Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Supervised and unsupervised PCFG adaptation to novel domains
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data
The Journal of Machine Learning Research
Domain adaptation with structural correspondence learning
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Domain adaptation for statistical classifiers
Journal of Artificial Intelligence Research
Analysis and development of Urdu POS tagged corpus
ALR7 Proceedings of the 7th Workshop on Asian Language Resources
Cross-language text classification using structural correspondence learning
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A vector space model for subjectivity classification in Urdu aided by co-training
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Subjectivity and sentiment analysis of modern standard Arabic
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Using sequence kernels to identify opinion entities in Urdu
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Hi-index | 0.00 |
The main aim of this work is to perform sentiment analysis on Urdu blog data. We use the method of structural correspondence learning (SCL) to transfer sentiment analysis learning from Urdu newswire data to Urdu blog data. The pivots needed to transfer learning from newswire domain to blog domain is not trivial as Urdu blog data, unlike newswire data is written in Latin script and exhibits code-mixing and code-switching behavior. We consider two oracles to generate the pivots. 1. Transliteration oracle, to accommodate script variation and spelling variation and 2. Translation oracle, to accommodate code-switching and code-mixing behavior. In order to identify strong candidates for translation, we propose a novel part-of-speech tagging method that helps select words based on POS categories that strongly reflect code-mixing behavior. We validate our approach against a supervised learning method and show that the performance of our proposed approach is comparable.