Random sampling in cut, flow, and network design problems
STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Approximating s-t minimum cuts in Õ(n2) time
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Reductions in streaming algorithms, with an application to counting triangles in graphs
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Finding frequent items in data streams
Theoretical Computer Science - Special issue on automata, languages and programming
Optimal approximations of the frequency moments of data streams
Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Sampling in dynamic data streams and applications
SCG '05 Proceedings of the twenty-first annual symposium on Computational geometry
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling
VLDB '05 Proceedings of the 31st international conference on Very large data bases
On graph problems in a semi-streaming model
Theoretical Computer Science - Automata, languages and programming: Algorithms and complexity (ICALP-A 2004)
Approximation and streaming algorithms for histogram construction problems
ACM Transactions on Database Systems (TODS)
Counting triangles in data streams
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Stable distributions, pseudorandom generators, embeddings, and data stream computation
Journal of the ACM (JACM)
A simple and linear time randomized algorithm for computing sparse spanners in weighted graphs
Random Structures & Algorithms
Numerical linear algebra in the streaming model
Proceedings of the forty-first annual ACM symposium on Theory of computing
Graph Sparsification in the Semi-streaming Model
ICALP '09 Proceedings of the 36th Internatilonal Collogquium on Automata, Languages and Programming: Part II
Graph Distances in the Data-Stream Model
SIAM Journal on Computing
An optimal algorithm for the distinct elements problem
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Optimal sampling from distributed streams
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Streaming and fully dynamic centralized algorithms for constructing and maintaining sparse spanners
ACM Transactions on Algorithms (TALG)
Tight bounds for Lp samplers, finding duplicates in streams, and related problems
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A general framework for graph sparsification
Proceedings of the forty-third annual ACM symposium on Theory of computing
Fast moment estimation in data streams in optimal space
Proceedings of the forty-third annual ACM symposium on Theory of computing
Linear programming in the semi-streaming model with application to the maximum matching problem
ICALP'11 Proceedings of the 38th international conference on Automata, languages and programming - Volume Part II
Analyzing graph structure via linear measurements
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
New streaming algorithms for counting triangles in graphs
COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
Finding graph matchings in data streams
APPROX'05/RANDOM'05 Proceedings of the 8th international workshop on Approximation, Randomization and Combinatorial Optimization Problems, and Proceedings of the 9th international conference on Randamization and Computation: algorithms and techniques
Weighted Matching in the Semi-Streaming Model
Algorithmica
Graph synopses, sketches, and streams: a survey
Proceedings of the VLDB Endowment
On the streaming complexity of computing local clustering coefficients
Proceedings of the sixth ACM international conference on Web search and data mining
A space efficient streaming algorithm for triangle counting using the birthday paradox
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Restreaming graph partitioning: simple versatile algorithms for advanced balancing
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
How robust are linear sketches to adaptive inputs?
Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Homomorphic fingerprints under misalignments: sketching edit and shift distances
Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Accurate and scalable nearest neighbors in large networks based on effective importance
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Counting and sampling triangles from a graph stream
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
When processing massive data sets, a core task is to construct synopses of the data. To be useful, a synopsis data structure should be easy to construct while also yielding good approximations of the relevant properties of the data set. A particularly useful class of synopses are sketches, i.e., those based on linear projections of the data. These are applicable in many models including various parallel, stream, and compressed sensing settings. A rich body of analytic and empirical work exists for sketching numerical data such as the frequencies of a set of entities. Our work investigates graph sketching where the graphs of interest encode the relationships between these entities. The main challenge is to capture this richer structure and build the necessary synopses with only linear measurements. In this paper we consider properties of graphs including the size of the cuts, the distances between nodes, and the prevalence of dense sub-graphs. Our main result is a sketch-based sparsifier construction: we show that Õ(nε-2) random linear projections of a graph on n nodes suffice to (1+ε) approximate all cut values. Similarly, we show that Õ(ε-2) linear projections suffice for (additively) approximating the fraction of induced sub-graphs that match a given pattern such as a small clique. Finally, for distance estimation we present sketch-based spanner constructions. In this last result the sketches are adaptive, i.e., the linear projections are performed in a small number of batches where each projection may be chosen dependent on the outcome of earlier sketches. All of the above results immediately give rise to data stream algorithms that also apply to dynamic graph streams where edges are both inserted and deleted. The non-adaptive sketches, such as those for sparsification and subgraphs, give us single-pass algorithms for distributed data streams with insertion and deletions. The adaptive sketches can be used to analyze MapReduce algorithms that use a small number of rounds.