Estimating clustering coefficients and size of social networks via random walk

Authors:
Stephen J. Hardiman;Liran Katzir
Affiliations:
Capital Fund Management, Paris, France;Microsoft Research, Herzliya, Israel
Venue:
Proceedings of the 22nd international conference on World Wide Web
Year:
2013

Citing 16
Cited 1

The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Counting triangles in data streams

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Group formation in large social networks: membership, growth, and evolution

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Analysis of topological characteristics of huge online social networking services

Proceedings of the 16th international conference on World Wide Web
Measurement and analysis of online social networks

Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Simulation and the Monte Carlo Method (Wiley Series in Probability and Statistics)

Simulation and the Monte Carlo Method (Wiley Series in Probability and Statistics)
Growth of the flickr social network

Proceedings of the first workshop on Online social networks
Random sampling from a search engine's index

Journal of the ACM (JACM)
Estimating the impressionrank of web pages

Proceedings of the 18th international conference on World wide web
Walking in facebook: a case study of unbiased sampling of OSNs

INFOCOM'10 Proceedings of the 29th conference on Information communications
Efficient algorithms for large-scale local triangle counting

ACM Transactions on Knowledge Discovery from Data (TKDD)
Measuring the mixing time of social graphs

IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Estimating and sampling graphs with multidimensional random walks

IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Estimating sizes of social networks via biased sampling

Proceedings of the 20th international conference on World wide web
Efficient Search Engine Measurements

ACM Transactions on the Web (TWEB)
The mixing time of the Newman: Watts small world

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms

On estimating the average degree

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Online social networks have become a major force in today's society and economy. The largest of today's social networks may have hundreds of millions to more than a billion users. Such networks are too large to be downloaded or stored locally, even if terms of use and privacy policies were to permit doing so. This limitation complicates even simple computational tasks. One such task is computing the clustering coefficient of a network. Another task is to compute the network size (number of registered users) or a subpopulation size. The clustering coefficient, a classic measure of network connectivity, comes in two flavors, global and network average. In this work, we provide efficient algorithms for estimating these measures which (1) assume no prior knowledge about the network; and (2) access the network using only the publicly available interface. More precisely, this work provides three new estimation algorithms (a) the first external access algorithm for estimating the global clustering coefficient; (b) an external access algorithm that improves on the accuracy of previous network average clustering coefficient estimation algorithms; and (c) an improved external access network size estimation algorithm. The main insight offered by this work is that only a relatively small number of public interface calls are required to allow our algorithms to achieve a high accuracy estimation. Our approach is to view a social network as an undirected graph and use the public interface to retrieve a random walk. To estimate the clustering coefficient, the connectivity of each node in the random walk sequence is tested in turn. We show that the error of this estimation drops exponentially in the number of random walk steps. Another insight of this work is the fact that, although the proposed algorithms can be used to estimate the clustering coefficient of any undirected graph, they are particularly efficient on social network-like graphs. To improve the network size prior-art estimation algorithms, we count node collision one step before they actually occur. In our experiments we validate our algorithms on several publicly available social network datasets. Our results validate the theoretical claims and demonstrate the effectiveness of our algorithms.