An Approximate L1-Difference Algorithm for Massive Data Streams

Authors:
J. Feigenbaum;S. Kannan;M. Strauss;M. Viswanathan
Affiliations:
-;-;-;-
Venue:
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Year:
1999

Citing 8
Cited 81

The space complexity of approximating the frequency moments

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Communication complexity

Communication complexity
Size-estimation framework with applications to transitive closure and reachability

Journal of Computer and System Sciences
Min-wise independent permutations (extended abstract)

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
Tracking join and self-join sizes in limited storage

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Synopsis data structures for massive data sets

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Probabilistic counting

SFCS '83 Proceedings of the 24th Annual Symposium on Foundations of Computer Science

Testing and spot-checking of data streams (extended abstract)

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
On computing correlated aggregates over continual data streams

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Estimating simple functions on the union of data streams

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Sampling algorithms: lower bounds and applications

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Data-streams and histograms

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
External memory algorithms and data structures: dealing with massive data

ACM Computing Surveys (CSUR)
Mining data streams under block evolution

ACM SIGKDD Explorations Newsletter
Space lower bounds for distance approximation in the data stream model

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Approximate counting of inversions in a data stream

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Fast, small-space algorithms for approximate histogram maintenance

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Characterizing memory requirements for queries over continuous data streams

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Reductions in streaming algorithms, with an application to counting triangles in graphs

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Maintaining stream statistics over sliding windows: (extended abstract)

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Processing complex aggregate queries over data streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Distributed streams algorithms for sliding windows

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Pass efficient algorithms for approximating large matrices

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Permutation Editing and Matching via Embeddings

ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
Secure Multiparty Computation of Approximations

ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Complexity of Comparing Hidden Markov Models

ISAAC '01 Proceedings of the 12th International Symposium on Algorithms and Computation
An Approximate Lp-Difference Algorithm for Massive Data Streams

STACS '00 Proceedings of the 17th Annual Symposium on Theoretical Aspects of Computer Science
Frequency Estimation of Internet Packet Streams with Limited Space

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Estimating Rarity and Similarity over Data Stream Windows

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Correlating XML data streams using tree-edit distance embeddings

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Comparing Data Streams Using Hamming Norms (How to Zero In)

IEEE Transactions on Knowledge and Data Engineering
One-Pass Wavelet Decompositions of Data Streams

IEEE Transactions on Knowledge and Data Engineering
Efficient Approximation of Correlated Sums on Data Streams

IEEE Transactions on Knowledge and Data Engineering
Issues in data stream management

ACM SIGMOD Record
External memory algorithms

Handbook of massive data sets
Better streaming algorithms for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
On finding common neighborhoods in massive graphs

Theoretical Computer Science
Processing set expressions over continuous update streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Computing Highly Specific and Mismatch Tolerant Oligomers Efficiently

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Characterizing memory requirements for queries over continuous data streams

ACM Transactions on Database Systems (TODS)
Finding frequent items in data streams

Theoretical Computer Science - Special issue on automata, languages and programming
Spatially-decaying aggregation over a network: model and algorithms

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Tracking set-expression cardinalities over continuous update streams

The VLDB Journal — The International Journal on Very Large Data Bases
Adaptive sampling for geometric problems over data streams

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Duplicate detection in click streams

WWW '05 Proceedings of the 14th international conference on World Wide Web
XML stream processing using tree-edit distance embeddings

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Improved range-summable random variable construction algorithms

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
An improved data stream summary: the count-min sketch and its applications

Journal of Algorithms
What's new: finding significant differences in network data streams

IEEE/ACM Transactions on Networking (TON)
Maintaining time-decaying stream aggregates

Journal of Algorithms
Stable distributions, pseudorandom generators, embeddings, and data stream computation

Journal of the ACM (JACM)
DSM-PLW: single-pass mining of path traversal patterns over streaming web click-sequences

Computer Networks: The International Journal of Computer and Telecommunications Networking - Web dynamics
Spatial scan statistics: approximations and performance study

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
An integrated efficient solution for computing frequent and top-k elements in data streams

ACM Transactions on Database Systems (TODS)
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
Spatially-decaying aggregation over a network

Journal of Computer and System Sciences
Sketching probabilistic data streams

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Very sparse stable random projections for dimension reduction in lα (0

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Comparing data streams using Hamming norms (how to zero in)

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
False positive or false negative: mining frequent itemsets from high speed transactional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Adaptive sampling for geometric problems over data streams

Computational Geometry: Theory and Applications
Estimators and tail bounds for dimension reduction in lα (0

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Multi-query optimization for sketch-based estimation

Information Systems
Compressed counting

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
ODMCA: An adaptive data mining control algorithm in multicarrier networks

Computer Communications
On-board vehicle data stream monitoring using mine-fleet and fast resource constrained monitoring of correlation matrices

New Generation Computing
Leveraging discarded samples for tighter estimation of multiple-set aggregates

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Coordinated weighted sampling for estimating aggregates over multiple weight assignments

Proceedings of the VLDB Endowment
Maintaining time-decaying stream aggregates

Journal of Algorithms
Application of Ɛ-testers algorithms under sketch and streaming calculation model in robot navigation

WSEAS Transactions on Computers
An efficient algorithm for finding similar short substrings from large scale string data

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Improving compressed counting

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Measuring independence of datasets

Proceedings of the forty-second ACM symposium on Theory of computing
Zero-one frequency laws

Proceedings of the forty-second ACM symposium on Theory of computing
Approximating sliding windows by cyclic tree-like histograms for efficient range queries

Data & Knowledge Engineering
On the Implementation of Huge Random Objects

SIAM Journal on Computing
Finding longest increasing and common subsequences in streaming data

COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
A single-pass online data mining algorithm combined with control theory with limited memory in dynamic data streams

GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
On approximation algorithms for data mining applications

Efficient Approximation and Online Algorithms
A false negative approach to mining frequent itemsets from high speed transactional data streams

Information Sciences: an International Journal
Improved sketching of hamming distance with error correcting

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Streaming algorithms measured in terms of the computed quantity

COCOON'07 Proceedings of the 13th annual international conference on Computing and Combinatorics
Testing Closeness of Discrete Distributions

Journal of the ACM (JACM)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We give a space-efficient, one-pass algorithm for approximating the L1 difference \math between two functions, when the function values ai and bi are given as data streams, and their order is chosen by an adversary. Our main technical innovation is a method of constructing families {Vj} of limited-independence random variables that are /range-summable/, by which we mean that the \math for \math is computable in time polylog(c), for all seeds s. These random-variable families may be of interest outside our current application domain, i.e., massive data streams generated by communication networks. Our L1-difference algorithm can be viewed as a ``sketching'' algorithm, in the sense of [Broder, Charikar, Frieze, and Mitzenmacher, STOC '98, pp. 327-336], and our algorithm performs better than that of Broder et al. when used to approximate the symmetric difference of two sets with small symmetric difference.