On power-law relationships of the Internet topology
Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
The degree sequence of a scale-free random graph process
Random Structures & Algorithms
ACM Transactions on Internet Technology (TOIT)
Classifying Special Interest Groups in Web Graphs
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
Random Structures & Algorithms
The Diameter of a Scale-Free Random Graph
Combinatorica
Crawling a country: better strategies than breadth-first for web page ordering
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Distribution of Vertex Degree in Web-Graphs
Combinatorics, Probability and Computing
On unbiased sampling for unstructured peer-to-peer networks
Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
The cover time of the preferential attachment graph
Journal of Combinatorial Theory Series B
On the bias of traceroute sampling: Or, power-law degree distributions in regular graphs
Journal of the ACM (JACM)
Impact of local topological information on random walks on finite graphs
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
The web as a graph: measurements, models, and methods
COCOON'99 Proceedings of the 5th annual international conference on Computing and combinatorics
Bias reduction in traceroute sampling - towards a more accurate map of the internet
WAW'07 Proceedings of the 5th international conference on Algorithms and models for the web-graph
Hi-index | 0.00 |
Sampling from large graphs is an area which is of great interest, particularly with the recent emergence of huge structures such as Online Social Networks. These often contain hundreds of millions of vertices and billions of edges. The large size of these networks makes it computationally expensive to obtain structural properties of the underlying graph by exhaustive search. If we can estimate these properties by taking small but representative samples from the network, then size is no longer a problem. In this paper we develop an analysis of random walks, a commonly used method of sampling from networks. We present a method of biassing the random walk to acquire a complete sample of high degree vertices of social networks, or similar graphs. The preferential attachment model is a common method to generate graphs with a power law degree sequence. For this model, we prove that this sampling method is successful with high probability. We also make experimental studies of the method on various real world networks. For t-vertex graphs G(t) generated by a preferential attachment process, we analyze a biassed random walk which makes transitions along undirected edges {x,y} proportional to [d(x)d(y)]b, where d(x) is the degree of vertex x and b 0 is a constant parameter. Let S(a) be the set of all vertices of degree at least ta in G(t). We show that for some b approx 2/3, if the biassed random walk starts at an arbitrary vertex of S(a), then with high probability the set S(a) can be discovered completely in ~O(t1-(4/3)a+d) steps, where d is a very small positive constant. The notation ~O ignores poly-log t factors. The preferential attachment process generates graphs with power law 3, so the above example is a special case of this result. For graphs with degree sequence power law c2 generated by a generalized preferential attachment process, a random walk with transitions along undirected edges {x,y} proportional to (d(x)d(y))(c-2)/2, discovers the set S(a) completely in ~O(t1-a(c-2)+d) steps with high probability. The cover time of the graph is ~O(t). Our results say that if we search preferential attachment graphs with a bias b=(c-2)/2 proportional to the power law c then, (i) we can find all high degree vertices quickly, and (ii) the time to discover all vertices is not much higher than in the case of a simple random walk. We conduct experimental tests on generated networks and real-world networks, which confirm these two properties.