Topic-wise, sentiment-wise, or otherwise?: Identifying the hidden dimension for unsupervised text classification

Authors:
Sajib Dasgupta;Vincent Ng
Affiliations:
University of Texas at Dallas, Richardson, TX;University of Texas at Dallas, Richardson, TX
Venue:
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Year:
2009

Citing 19
Cited 15

Making large-scale support vector machine learning practical

Advances in kernel methods
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Segmentation Using Eigenvectors: A Unifying View

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
On clusterings: Good, bad and spectral

Journal of the ACM (JACM)
Kernel k-means: spectral clustering and normalized cuts

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Document Clustering Using Locality Preserving Indexing

IEEE Transactions on Knowledge and Data Engineering
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Determining the sentiment of opinions

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
An interactive algorithm for asking and incorporating feature feedback into support vector machines

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Active learning with confidence

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Interactive feature space construction using semantic information

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Text classification by labeling words

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Multilingual subjectivity analysis using machine translation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Interactive clustering of text collections according to a user-specified criterion

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Spectral learning

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Towards subjectifying text clustering

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A comparative study of Bayesian models for unsupervised sentiment detection

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Learning sentiment classification model from labeled features

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Which clustering do you want? inducing your ideal clustering with minimal feedback

Journal of Artificial Intelligence Research
Self-training from labeled features for sentiment analysis

Information Processing and Management: an International Journal
Incorporating Sentiment Prior Knowledge for Weakly Supervised Sentiment Analysis

ACM Transactions on Asian Language Information Processing (TALIP)
Survey on mining subjective data on the web

Data Mining and Knowledge Discovery
The 5w structure for sentiment summarization-visualization-tracking

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Sentiment analysis: what is the end user's requirement?

Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Sentimantics: conceptual spaces for lexical sentiment polarity representation with contextuality

WASSA '12 Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis
A comparative study of feature selection and machine learning techniques for sentiment analysis

Proceedings of the 2012 ACM Research in Applied Computation Symposium
WikiSent: weakly supervised sentiment analysis through extractive summarization with wikipedia

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Time-space varying visual analysis of micro-blog sentiment

Proceedings of the 6th International Symposium on Visual Information Communication and Interaction
A boosted SVM based sentiment analysis approach for online opinionated text

Proceedings of the 2013 Research in Adaptive and Convergent Systems
A boosted SVM based ensemble classifier for sentiment analysis of online reviews

ACM SIGAPP Applied Computing Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

While traditional work on text clustering has largely focused on grouping documents by topic, it is conceivable that a user may want to cluster documents along other dimensions, such as the author's mood, gender, age, or sentiment. Without knowing the user's intention, a clustering algorithm will only group documents along the most prominent dimension, which may not be the one the user desires. To address this problem, we propose a novel way of incorporating user feedback into a clustering algorithm, which allows a user to easily specify the dimension along which she wants the data points to be clustered via inspecting only a small number of words. This distinguishes our method from existing ones, which typically require a large amount of effort on the part of humans in the form of document annotation or interactive construction of the feature space. We demonstrate the viability of our method on several challenging sentiment datasets.