A vector space model for subjectivity classification in Urdu aided by co-training

Authors:
Smruthi Mukund;Rohini K. Srihari
Affiliations:
University at Buffalo;University at Buffalo
Venue:
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Year:
2010

Citing 15
Cited 2

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Tracking point of view in narrative

Computational Linguistics
Development and use of a gold-standard data set for subjectivity classifications

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Opinion observer: analyzing and comparing opinions on the Web

WWW '05 Proceedings of the 14th international conference on World Wide Web
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Learning subjective nouns using extraction pattern bootstrapping

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Using CoTraining and Semantic Feature Extraction for Positive and Unlabeled Text Classification

FITME '08 Proceedings of the 2008 International Seminar on Future Information Technology and Management Engineering
NE tagging for Urdu based on bootstrap POS learning

CLIAWS3 '09 Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
From words to senses: a case study of subjectivity recognition

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Multilingual subjectivity analysis using machine translation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Beyond TFIDF weighting for text categorization in the vector space model

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Co-training for cross-lingual sentiment classification

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Creating subjective and objective sentence classifiers from unannotated texts

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Using sequence kernels to identify opinion entities in Urdu

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Analyzing Urdu social media for sentiments using transfer learning with controlled translations

LSM '12 Proceedings of the Second Workshop on Language in Social Media

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of this work is to produce a classifier that can distinguish subjective sentences from objective sentences for the Urdu language. The amount of labeled data required for training automatic classifiers can be highly imbalanced especially in the multilingual paradigm as generating annotations is an expensive task. In this work, we propose a cotraining approach for subjectivity analysis in the Urdu language that augments the positive set (subjective set) and generates a negative set (objective set) devoid of all samples close to the positive ones. Using the data set thus generated for training, we conduct experiments based on SVM and VSM algorithms, and show that our modified VSM based approach works remarkably well as a sentence level subjectivity classifier.