Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Reductions in streaming algorithms, with an application to counting triangles in graphs
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Streaming and sublinear approximation of entropy and information distances
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Structure and evolution of online social networks
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
On biased reservoir sampling in the presence of stream evolution
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
On unbiased sampling for unstructured peer-to-peer networks
Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
Analysis of topological characteristics of huge online social networking services
Proceedings of the 16th international conference on World Wide Web
Sampling large Internet topologies for simulation purposes
Computer Networks: The International Journal of Computer and Telecommunications Networking
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Measurement and analysis of online social networks
Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Planetary-scale views on a large instant-messaging network
Proceedings of the 17th international conference on World Wide Web
Estimating PageRank on graph streams
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Microscopic evolution of social networks
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Comparison of online social relations in volume vs interaction: a case study of cyworld
Proceedings of the 8th ACM SIGCOMM conference on Internet measurement
Metropolis Algorithms for Representative Subgraph Sampling
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
User interactions in social networks and their implications
Proceedings of the 4th ACM European conference on Computer systems
On the evolution of user interaction in Facebook
Proceedings of the 2nd ACM workshop on Online social networks
Statistical Analysis of Network Data: Methods and Models
Statistical Analysis of Network Data: Methods and Models
Managing and Mining Graph Data
Managing and Mining Graph Data
Proceedings of the 19th international conference on World wide web
Walking in facebook: a case study of unbiased sampling of OSNs
INFOCOM'10 Proceedings of the 29th conference on Information communications
Estimating and sampling graphs with multidimensional random walks
IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Outlier detection in graph streams
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Benefits of bias: towards better characterization of network sampling
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Transforming graph data for statistical relational learning
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
In order to efficiently study the characteristics of network domains and support development of network systems (e.g. algorithms, protocols that operate on networks), it is often necessary to sample a representative subgraph from a large complex network. Although recent subgraph sampling methods have been shown to work well, they focus on sampling from memory-resident graphs and assume that the sampling algorithm can access the entire graph in order to decide which nodes/edges to select. Many large-scale network datasets, however, are too large and/or dynamic to be processed using main memory (e.g., email, tweets, wall posts). In this work, we formulate the problem of sampling from large graph streams. We propose a streaming graph sampling algorithm that dynamically maintains a representative sample in a reservoir based setting. We evaluate the efficacy of our proposed methods empirically using several real-world data sets. Across all datasets, we found that our method produce samples that preserve better the original graph distributions.