The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Counting triangles in data streams
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Group formation in large social networks: membership, growth, and evolution
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Analysis of topological characteristics of huge online social networking services
Proceedings of the 16th international conference on World Wide Web
Measurement and analysis of online social networks
Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Simulation and the Monte Carlo Method (Wiley Series in Probability and Statistics)
Simulation and the Monte Carlo Method (Wiley Series in Probability and Statistics)
Growth of the flickr social network
Proceedings of the first workshop on Online social networks
Random sampling from a search engine's index
Journal of the ACM (JACM)
Estimating the impressionrank of web pages
Proceedings of the 18th international conference on World wide web
Walking in facebook: a case study of unbiased sampling of OSNs
INFOCOM'10 Proceedings of the 29th conference on Information communications
Efficient algorithms for large-scale local triangle counting
ACM Transactions on Knowledge Discovery from Data (TKDD)
Measuring the mixing time of social graphs
IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Estimating and sampling graphs with multidimensional random walks
IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Estimating sizes of social networks via biased sampling
Proceedings of the 20th international conference on World wide web
Efficient Search Engine Measurements
ACM Transactions on the Web (TWEB)
The mixing time of the Newman: Watts small world
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
On estimating the average degree
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
Online social networks have become a major force in today's society and economy. The largest of today's social networks may have hundreds of millions to more than a billion users. Such networks are too large to be downloaded or stored locally, even if terms of use and privacy policies were to permit doing so. This limitation complicates even simple computational tasks. One such task is computing the clustering coefficient of a network. Another task is to compute the network size (number of registered users) or a subpopulation size. The clustering coefficient, a classic measure of network connectivity, comes in two flavors, global and network average. In this work, we provide efficient algorithms for estimating these measures which (1) assume no prior knowledge about the network; and (2) access the network using only the publicly available interface. More precisely, this work provides three new estimation algorithms (a) the first external access algorithm for estimating the global clustering coefficient; (b) an external access algorithm that improves on the accuracy of previous network average clustering coefficient estimation algorithms; and (c) an improved external access network size estimation algorithm. The main insight offered by this work is that only a relatively small number of public interface calls are required to allow our algorithms to achieve a high accuracy estimation. Our approach is to view a social network as an undirected graph and use the public interface to retrieve a random walk. To estimate the clustering coefficient, the connectivity of each node in the random walk sequence is tested in turn. We show that the error of this estimation drops exponentially in the number of random walk steps. Another insight of this work is the fact that, although the proposed algorithms can be used to estimate the clustering coefficient of any undirected graph, they are particularly efficient on social network-like graphs. To improve the network size prior-art estimation algorithms, we count node collision one step before they actually occur. In our experiments we validate our algorithms on several publicly available social network datasets. Our results validate the theoretical claims and demonstrate the effectiveness of our algorithms.