Harnessing web page directories for large-scale classification of tweets

  • Authors:
  • Arkaitz Zubiaga;Heng Ji

  • Affiliations:
  • City University of New York, New York, NY, USA;City University of New York, New York, NY, USA

  • Venue:
  • Proceedings of the 22nd international conference on World Wide Web companion
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Classification is paramount for an optimal processing of tweets, albeit performance of classifiers is hindered by the need of large sets of training data to encompass the diversity of contents one can find on Twitter. In this paper, we introduce an inexpensive way of labeling large sets of tweets, which can be easily regenerated or updated when needed. We use human-edited web page directories to infer categories from URLs contained in tweets. By experimenting with a large set of more than 5 million tweets categorized accordingly, we show that our proposed model for tweet classification can achieve 82% in accuracy, performing only 12.2% worse than for web page classification.