Harnessing web page directories for large-scale classification of tweets

Authors:
Arkaitz Zubiaga;Heng Ji
Affiliations:
City University of New York, New York, NY, USA;City University of New York, New York, NY, USA
Venue:
Proceedings of the 22nd international conference on World Wide Web companion
Year:
2013

Citing 5
Cited 0

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Distant supervision for relation extraction without labeled data

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Short text classification in twitter to improve information filtering

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Classifying trending topics: a typology of conversation triggers on Twitter

Proceedings of the 20th ACM international conference on Information and knowledge management
Topic classification of blog posts using distant supervision

Proceedings of the Workshop on Semantic Analysis in Social Media

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classification is paramount for an optimal processing of tweets, albeit performance of classifiers is hindered by the need of large sets of training data to encompass the diversity of contents one can find on Twitter. In this paper, we introduce an inexpensive way of labeling large sets of tweets, which can be easily regenerated or updated when needed. We use human-edited web page directories to infer categories from URLs contained in tweets. By experimenting with a large set of more than 5 million tweets categorized accordingly, we show that our proposed model for tweet classification can achieve 82% in accuracy, performing only 12.2% worse than for web page classification.