Graph-based collective classification for tweets

Authors:
Yajuan Duan;Furu Wei;Ming Zhou;Heung-Yeung Shum
Affiliations:
University of Science and Technology of China, Hefei, China;Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China;Microsoft Corporation, Redmond, USA
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 4
Cited 1

Learning to classify short and sparse text & web with hidden topics from large-scale data collections

Proceedings of the 17th international conference on World Wide Web
Beyond Microblogging: Conversation and Collaboration via Twitter

HICSS '09 Proceedings of the 42nd Hawaii International Conference on System Sciences
Short text classification in twitter to improve information filtering

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Classification of short texts by deploying topical annotations

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval

Classifying microblogs for disasters

Proceedings of the 18th Australasian Document Computing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we address the problem of classifying tweets into topical categories. Because of the short, noisy and ambiguous nature of tweets, we propose to collectively conduct the classification by exploiting the context information (i.e. related tweets) other than individually as in conventional text classification methods. In particular, we augment the content-based representation of text with tweets sharing same #hashtag or URL, which results in a tweet graph. We then formulate the tweet classification task under a graph optimization framework. We investigate three popular approaches, namely, Loopy Belief Propagation (LBP), Relaxation Labeling (RL), and Iterative Classification Algorithm (ICA). Extensive experiment results show that the graph-based tweet classification approach remarkably improves the performance, while the ICA model with relationship of sharing the same #hashtag gives the best result on separate tweet graph.