Two Phase Extraction Method for Multi-label Classification of Real Life Tweets

Authors:
Shuhei Yamamoto;Tetsuji Satoh
Affiliations:
Graduate School of Library, Information and Media Studies, University of Tsukuba, 1-2 Kasuga, Tsukuba-city, Ibaraki, Japan;Graduate School of Library, Information and Media Studies, University of Tsukuba, 1-2 Kasuga, Tsukuba-city, Ibaraki, Japan
Venue:
Proceedings of International Conference on Information Integration and Web-based Applications & Services
Year:
2013

Citing 10
Cited 0

A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
Experience Mining: Building a Large-Scale Database of Personal Experiences and Opinions from Web Documents

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Characterizing debate performance via aggregated twitter sentiment

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Earthquake shakes Twitter users: real-time event detection by social sensors

Proceedings of the 19th international conference on World wide web
TwitterMonitor: trend detection over the twitter stream

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Discovering users' topics of interest on twitter: a first look

AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Empirical study of topic modeling in Twitter

Proceedings of the First Workshop on Social Media Analytics
Topical keyphrase extraction from Twitter

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Blog map of experiences: extracting and geographically mapping visitor experiences from urban blogs

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, many users share their daily events and opinions on Twitter. Some are beneficial and comment on several aspects of a user's real life, i.e., eating, traffic, weather, disasters, and so on. Such posts as "The train is not coming!" are categorized in the "Traffic" aspect and will support users who want to ride the train. Such tweets as "The train is not coming due to heavy rain" are categorized in both the "Traffic" and "Weather" aspects. In this paper, we propose a multi-label method that estimates appropriate aspects against unknown tweets by extending the two phase extraction method. In it, many topics are extracted from a sea of tweets using Latent Dirichlet Allocation (LDA). Associations among many topics and fewer aspects are built using a small set of labeled tweets. Aspect scores for unknown tweets are calculated using the associations among the topics and the aspects based on the extracted terms. Appropriate aspects are labeled for unknown tweets by averaging of the aspect scores. Using a large amount of actual tweets, our sophisticated experimental evaluations demonstrate the high efficiency of our proposed multi-label classification method. When an aspect score is much larger than others, that aspect is estimated against the tweet. When several aspect scores are large within similar values, these aspects are estimated. Based on the experimental evaluation results, our prototype system demonstrates that our proposed method can appropriately estimate some aspects of each unknown tweets.