A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
The Journal of Machine Learning Research
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Characterizing debate performance via aggregated twitter sentiment
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Earthquake shakes Twitter users: real-time event detection by social sensors
Proceedings of the 19th international conference on World wide web
TwitterMonitor: trend detection over the twitter stream
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Discovering users' topics of interest on twitter: a first look
AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Empirical study of topic modeling in Twitter
Proceedings of the First Workshop on Social Media Analytics
Topical keyphrase extraction from Twitter
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Blog map of experiences: extracting and geographically mapping visitor experiences from urban blogs
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Hi-index | 0.00 |
Recently, many users share their daily events and opinions on Twitter. Some are beneficial and comment on several aspects of a user's real life, i.e., eating, traffic, weather, disasters, and so on. Such posts as "The train is not coming!" are categorized in the "Traffic" aspect and will support users who want to ride the train. Such tweets as "The train is not coming due to heavy rain" are categorized in both the "Traffic" and "Weather" aspects. In this paper, we propose a multi-label method that estimates appropriate aspects against unknown tweets by extending the two phase extraction method. In it, many topics are extracted from a sea of tweets using Latent Dirichlet Allocation (LDA). Associations among many topics and fewer aspects are built using a small set of labeled tweets. Aspect scores for unknown tweets are calculated using the associations among the topics and the aspects based on the extracted terms. Appropriate aspects are labeled for unknown tweets by averaging of the aspect scores. Using a large amount of actual tweets, our sophisticated experimental evaluations demonstrate the high efficiency of our proposed multi-label classification method. When an aspect score is much larger than others, that aspect is estimated against the tweet. When several aspect scores are large within similar values, these aspects are estimated. Based on the experimental evaluation results, our prototype system demonstrates that our proposed method can appropriately estimate some aspects of each unknown tweets.