Crowd crawling: towards collaborative data collection for large-scale online social networks

Authors:
Cong Ding;Yang Chen;Xiaoming Fu
Affiliations:
University of Goettingen, Goettingen, Germany;Duke University, Durham, USA;University of Goettingen, Goettingen, Germany
Venue:
Proceedings of the first ACM conference on Online social networks
Year:
2013

Citing 14
Cited 0

Parallel crawlers

Proceedings of the 11th international conference on World Wide Web
An adaptive timeout algorithm for retransmission across a packet switching network

SIGCOMM '84 Proceedings of the ACM SIGCOMM symposium on Communications architectures and protocols: tutorials & symposium
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
What is Twitter, a social network or a news media?

Proceedings of the 19th international conference on World wide web
Walking in facebook: a case study of unbiased sampling of OSNs

INFOCOM'10 Proceedings of the 29th conference on Information communications
The little engine(s) that could: scaling online social networks

Proceedings of the ACM SIGCOMM 2010 conference
The case for crowd computing

Proceedings of the second ACM SIGCOMM workshop on Networking, systems, and applications on mobile handhelds
Understanding latent interactions in online social networks

IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Estimating and sampling graphs with multidimensional random walks

IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Location Cheating: A Security Challenge to Location-Based Social Network Services

ICDCS '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems
Understanding Graph Sampling Algorithms for Social Network Analysis

ICDCSW '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems Workshops
Defending against large-scale crawls in online social networks

Proceedings of the 8th international conference on Emerging networking experiments and technologies
Unveiling the patterns of video tweeting: a sina weibo-based measurement study

PAM'13 Proceedings of the 14th international conference on Passive and Active Measurement
Dasu: pushing experiments to the internet's edge

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The emerging research for online social networks (OSNs) requires a huge amount of data. However, OSN sites typically enforce restrictions for data crawling, such as request rate limiting on a per-IP basis. It becomes challenging for an individual research group to collect sufficient data by using its own network resources. In this paper, we introduce and motivate crowd crawling, which allows multiple research groups to efficiently crawl data in a collaborative way. Crowd crawling is carefully designed by addressing several practical challenges including resource diversity of different partners, strict request rate limiting from OSN providers, and data fidelity. We implemented and deployed a crowd crawling prototype on PlanetLab, and demonstrated its performance through evaluations. We have made the datasets crawled in our evaluation publicly available.