Crowd crawling: towards collaborative data collection for large-scale online social networks

  • Authors:
  • Cong Ding;Yang Chen;Xiaoming Fu

  • Affiliations:
  • University of Goettingen, Goettingen, Germany;Duke University, Durham, USA;University of Goettingen, Goettingen, Germany

  • Venue:
  • Proceedings of the first ACM conference on Online social networks
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The emerging research for online social networks (OSNs) requires a huge amount of data. However, OSN sites typically enforce restrictions for data crawling, such as request rate limiting on a per-IP basis. It becomes challenging for an individual research group to collect sufficient data by using its own network resources. In this paper, we introduce and motivate crowd crawling, which allows multiple research groups to efficiently crawl data in a collaborative way. Crowd crawling is carefully designed by addressing several practical challenges including resource diversity of different partners, strict request rate limiting from OSN providers, and data fidelity. We implemented and deployed a crowd crawling prototype on PlanetLab, and demonstrated its performance through evaluations. We have made the datasets crawled in our evaluation publicly available.