Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fastest Mixing Markov Chain on a Graph
SIAM Review
Random walks in peer-to-peer networks: algorithms and evaluation
Performance Evaluation - P2P computing systems
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
On unbiased sampling for unstructured peer-to-peer networks
Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
Analysis of topological characteristics of huge online social networking services
Proceedings of the 16th international conference on World Wide Web
Measurement and analysis of online social networks
Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Proceedings of the first workshop on Online social networks
Growth of the flickr social network
Proceedings of the first workshop on Online social networks
User interactions in social networks and their implications
Proceedings of the 4th ACM European conference on Computer systems
On the bias of traceroute sampling: Or, power-law degree distributions in regular graphs
Journal of the ACM (JACM)
Statistical Analysis of Network Data: Methods and Models
Statistical Analysis of Network Data: Methods and Models
Walking in facebook: a case study of unbiased sampling of OSNs
INFOCOM'10 Proceedings of the 29th conference on Information communications
Measuring the mixing time of social graphs
IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Estimating and sampling graphs with multidimensional random walks
IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Supervised random walks: predicting and recommending links in social networks
Proceedings of the fourth ACM international conference on Web search and data mining
Proceedings of the 20th international conference on World wide web
Walking on a graph with a magnifying glass: stratified sampling via weighted random walks
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Learn more, sample less: control of volume and variance in network measurement
IEEE Transactions on Information Theory
Homophily, popularity and randomness: modelling growth of online social network
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
An agent-based model of the development of friendship links within Facebook
Computational & Mathematical Organization Theory
Hi-index | 0.00 |
Our objective is to sample the node set of a large unknown graph via crawling, to accurately estimate a given metric of interest. We design a random walk on an appropriately defined weighted graph that achieves high efficiency by preferentially crawling those nodes and edges that convey greater information regarding the target metric. Our approach begins by employing the theory of stratification to find optimal node weights, for a given estimation problem, under an independence sampler. While optimal under independence sampling, these weights may be impractical under graph crawling due to constraints arising from the structure of the graph. Therefore, the edge weights for our random walk should be chosen so as to lead to an equilibrium distribution that strikes a balance between approximating the optimal weights under an independence sampler and achieving fast convergence. We propose a heuristic approach (stratified weighted random walk, or S-WRW) that achieves this goal, while using only limited information about the graph structure and the node properties. We evaluate our technique in simulation, and experimentally, by collecting a sample of Facebook college users. We show that S-WRW requires 13-15 times fewer samples than the simple re-weighted random walk (RW) to achieve the same estimation accuracy for a range of metrics.