Social Streams Blog Crawler

  • Authors:
  • Matthew Hurst;Alexey Maykov

  • Affiliations:
  • -;-

  • Venue:
  • ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Weblogs, and other forms of social media, differ from traditional web content in many ways. One of the most important differences is the highly temporal nature of the content. Applications that leverage social media content must, to be effective, have access to this data with minimal publication/acquisition latency. An effective weblog crawler should satisfy the following requirements: low latency, highly scalable, high data quality and appropriate network politeness. In this paper, we outline the weblog crawler implemented in the social streams project and summarize the challenges faced during development.