Selfish behavior and stability of the internet:: a game-theoretic analysis of TCP
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Sampling lower bounds via information theory
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Clustering Large Graphs via the Singular Value Decomposition
Machine Learning
Synopses for query optimization: a space-complexity perspective
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Streaming and sublinear approximation of entropy and information distances
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Synopses for query optimization: A space-complexity perspective
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Practical Algorithms and Lower Bounds for Similarity Search in Massive Graphs
IEEE Transactions on Knowledge and Data Engineering
Sampling subproblems of heterogeneous Max-Cut problems and approximation algorithms
Random Structures & Algorithms
Sketching in adversarial environments
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
The average-case complexity of counting distinct elements
Proceedings of the 12th International Conference on Database Theory
Sublinear estimation of entropy and information distances
ACM Transactions on Algorithms (TALG)
Fast moment estimation in data streams in optimal space
Proceedings of the forty-third annual ACM symposium on Theory of computing
Sampling sub-problems of heterogeneous max-cut problems and approximation algorithms
STACS'05 Proceedings of the 22nd annual conference on Theoretical Aspects of Computer Science
Space-efficient estimation of statistics over sub-sampled streams
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Randomized algorithms for tracking distributed count, frequencies, and ranks
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Tight bounds for distributed functional monitoring
STOC '12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing
Sketching in Adversarial Environments
SIAM Journal on Computing
Indexing for summary queries: Theory and practice
ACM Transactions on Database Systems (TODS)
Tight lower bound for linear sketches of moments
ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part I
Hi-index | 0.00 |
Numerous massive data sets, ranging from flows of Internet traffic to logs of supermarket transactions, have emerged during the past few years. Their overwhelming size and the typically restricted access to them call for new computational models. This thesis studies three such models: sampling computations, data stream computations, and sketch computations. While most of the previous work focused on designing algorithms in the new models, this thesis revolves around the limitations of the models. We develop a suite of lower bound techniques that characterize the complexity of functions in these models, indicating which problems can be solved efficiently in them. We derive specific bounds for a multitude of practical problems, arising from applications in database, networking, and information retrieval, such as frequency statistics, selection functions, statistical moments, and distance estimation. We present general, powerful, and easy to use lower bound techniques for the sampling model. The techniques apply to all functions and address both oblivious and adaptive sampling. They frequently produce optimal bounds for a wide range of functions. They are stated in terms of new combinatorial and statistical properties of functions, which are easy to calculate. We obtain lower bounds for the data stream and sketch models through one-way and simultaneous communication complexity. We develop lower bounds for the latter via a new information-theoretic view of communication complexity. A highlight of this work is an optimal simultaneous communication complexity lower bound for the important multi-party set-disjointness problem. Finally, we present a powerful method for proving lower bounds for general communication complexity. The method is based on a direct sum property of a new measure of complexity for communication complexity protocols and on a novel statistical view of communication complexity. We use the technique to obtain improved communication complexity and data stream lower bounds for several problems, including multi-party set-disjointness, frequency moments, and Lp distance estimation. These results solve open problems of Alon, Matias, and Szegedy and of Saks and Sun.