Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Why we twitter: understanding microblogging usage and communities
Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Earthquake shakes Twitter users: real-time event detection by social sensors
Proceedings of the 19th international conference on World wide web
Extracting events and event descriptions from Twitter
Proceedings of the 20th international conference companion on World wide web
Event discovery in social media feeds
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Leveraging the semantics of tweets for adaptive faceted search on twitter
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
Evidential location estimation for events detected in Twitter
Proceedings of the 7th Workshop on Geographic Information Retrieval
Hi-index | 0.00 |
Various applications are developed today on top of microblogging services like Twitter. In order to engineer Web applications which operate on microblogging data, there is a need for appropriate filtering techniques to identify messages. In this paper, we focus on detecting Twitter messages (tweets) that report on social events. We introduce a filtering pipeline that exploits textual features and n-grams to classify messages into event related and non-event related tweets. We analyze the impact of preprocessing techniques, achieving accuracies higher than 80%. Further, we present a strategy to automate labeling of training data, since our proposed filtering pipeline requires training data. When testing on our dataset, this semi-automated method achieves an accuracy of 79% and results comparable to the manual labeling approach.