Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Analysis of topological characteristics of huge online social networking services
Proceedings of the 16th international conference on World Wide Web
Metropolis Algorithms for Representative Subgraph Sampling
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
What is Twitter, a social network or a news media?
Proceedings of the 19th international conference on World wide web
Proceedings of the 19th international conference on World wide web
Estimating and sampling graphs with multidimensional random walks
IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Proceedings of the 14th International Conference on Extending Database Technology
Efficient and simple generation of random simple connected graphs with prescribed degree sequence
COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
Generation of test databases using sampling methods
Proceedings of the 2013 International Symposium on Software Testing and Analysis
Hi-index | 0.00 |
A recurrent challenge for modern applications is the processing of large graphs. The ability to generate representative samples of smaller size is useful not only to circumvent scalability issues but also, per se, for statistical analysis and other data mining tasks. For such purposes adequate sampling techniques must be devised. We are interested, in this paper, in the uniform random sampling of a connected subgraph from a graph. We require that the sample contains a prescribed number of vertices. The sampled graph is the corresponding induced graph. We devise, present and discuss several algorithms that leverage three different techniques: Rejection Sampling, Random Walk and Markov Chain Monte Carlo. We empirically evaluate and compare the performance of the algorithms. We show that they are effective and efficient but that there is a trade-off, which depends on the density of the graphs and the sample size. We propose one novel algorithm, which we call Neighbour Reservoir Sampling (NRS), that very successfully realizes the trade-off between effectiveness and efficiency.