On finding common neighborhoods in massive graphs

Authors:
Adam L. Buchsbaum;Raffaele Giancarlo;Jeffery R. Westbrook
Affiliations:
AT&T Labs, Shannon Laboratory, 180 Park Avenue, Florham Park, NJ;Dipartimento di Matematica ed Applicazioni, Universitá di Palermo, Via Archirafi 34, 90123 Palermo, Italy;4031 South Hempstead Circle, San Diego, CA
Venue:
Theoretical Computer Science
Year:
2003

Citing 20
Cited 7

Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences
Randomized algorithms

Randomized algorithms
New asymptotics for bipartite Tura´n numbers

Journal of Combinatorial Theory Series A
Extremal graph theory

Handbook of combinatorics (vol. 2)
Communication complexity

Communication complexity
Approximate medians and other quantiles in one pass and with limited memory

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
Random sampling techniques for space efficient online computation of order statistics of large datasets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments

Journal of Computer and System Sciences
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Testing and spot-checking of data streams (extended abstract)

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Hancock: a language for extracting signatures from data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
External memory algorithms and data structures: dealing with massive data

ACM Computing Surveys (CSUR)
Mining Very Large Databases

Computer
Distributed Data Mining in Credit Card Fraud Detection

IEEE Intelligent Systems
Computing Iceberg Queries Efficiently

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
An Approximate L1-Difference Algorithm for Massive Data Streams

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Stable distributions, pseudorandom generators, embeddings and data stream computation

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
The MIDAS Data-Mining Project at Stanford

IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
Gecko: tracking a very large billing system

ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference

Efficient algorithms for constructing (1+,ε, β)-spanners in the distributed and streaming models

Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
On graph problems in a semi-streaming model

Theoretical Computer Science - Automata, languages and programming: Algorithms and complexity (ICALP-A 2004)
New results for finding common neighborhoods in massive graphs in the data stream model

Theoretical Computer Science
Intractability of min- and max-cut in streaming graphs

Information Processing Letters
New streaming algorithms for counting triangles in graphs

COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
Finding graph matchings in data streams

APPROX'05/RANDOM'05 Proceedings of the 8th international workshop on Approximation, Randomization and Combinatorial Optimization Problems, and Proceedings of the 9th international conference on Randamization and Computation: algorithms and techniques
Streaming algorithms measured in terms of the computed quantity

COCOON'07 Proceedings of the 13th annual international conference on Computing and Combinatorics

Quantified Score

Hi-index	5.23

Visualization

Abstract

We consider the problem of finding pairs of vertices that share large common neighborhoods in massive graphs. We prove lower bounds on the resources needed to solve this problem on resource-bounded models of computation. In streaming models, in which algorithms can access the input only a constant number of times and only sequentially, we show that, even with randomization, any algorithm that determines if there exists any pair of vertices with a large common neighborhood must essentially store and process the input graph off line. In sampling models, in which algorithms can only query an oracle for the common neighborhoods of specified vertex pairs, we show that any algorithm must sample almost every pair of vertices for their respective common neighborhoods.