Randomized algorithms
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Size-estimation framework with applications to transitive closure and reachability
Journal of Computer and System Sciences
Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A small approximately min-wise independent family of hash functions
Journal of Algorithms
Estimating simple functions on the union of data streams
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Mining time-changing data streams
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Fast, small-space algorithms for approximate histogram maintenance
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sampling from a moving window over streaming data
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Maintaining stream statistics over sliding windows: (extended abstract)
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Mining database structure; or, how to build a data quality browser
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries
Proceedings of the 27th International Conference on Very Large Data Bases
An Approximate L1-Difference Algorithm for Massive Data Streams
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Stable distributions, pseudorandom generators, embeddings and data stream computation
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Finding Interesting Associations without Support Pruning
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Fjording the Stream: An Architecture for Queries Over Streaming Sensor Data
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
SFCS '83 Proceedings of the 24th Annual Symposium on Foundations of Computer Science
A symbolic representation of time series, with implications for streaming algorithms
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Distributed deviation detection in sensor networks
ACM SIGMOD Record
Space efficient mining of multigraph streams
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sampling algorithms in a stream operator
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Detecting malicious network traffic using inverse distributions of packet contents
Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Maintaining significant stream statistics over sliding windows
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
What's new: finding significant differences in network data streams
IEEE/ACM Transactions on Networking (TON)
Online clustering of parallel data streams
Data & Knowledge Engineering
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Maintaining stream statistics over multiscale sliding windows
ACM Transactions on Database Systems (TODS)
Deterministic algorithms for sampling count data
Data & Knowledge Engineering
On distributing symmetric streaming computations
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient instance-based learning on data streams
Intelligent Data Analysis
Collaborative data gathering in wireless sensor networks using measurement co-occurrence
Computer Communications
Optimal sampling from sliding windows
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
GAMPS: compressing multi sensor data by grouping and amplitude scaling
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
SSTD '09 Proceedings of the 11th International Symposium on Advances in Spatial and Temporal Databases
Competitive Analysis of Aggregate Max in Windowed Streaming
ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Sketching techniques for collaborative filtering
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
An efficient algorithm for instance-based learning on data streams
ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
An online framework for catching top spreaders and scanners
Computer Networks: The International Journal of Computer and Telecommunications Networking
On distributing symmetric streaming computations
ACM Transactions on Algorithms (TALG)
Exponential time improvement for min-wise based algorithms
Information and Computation
Effective Computations on Sliding Windows
SIAM Journal on Computing
Optimal sampling from sliding windows
Journal of Computer and System Sciences
Exponential time improvement for min-wise based algorithms
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Streaming algorithms for data in motion
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
SkyDiver: a framework for skyline diversification
Proceedings of the 16th International Conference on Extending Database Technology
Bottom-k and priority sampling, set similarity and subset sums with minimal independence
Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Sketching for big data recommender systems using fast pseudo-random fingerprints
ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part II
Efficient sampling of non-strict turnstile data streams
FCT'13 Proceedings of the 19th international conference on Fundamentals of Computation Theory
Hi-index | 0.00 |
In the windowed data stream model, we observe items coming in over time. At any time t, we consider the window of the last N observations at-(N - 1), at-(N - 2), . . . , at, each ai 驴 {1, . . . , u}; we are required to support queries about the data in the window. A crucial restriction is that we are only allowed o(N) (often polylogarithmic in N) storage space, so not all items within the window can be archived.We study two basic problems in the windowed data stream model. The first is the estimation of the rarity of items in the window. Our second problem is one of estimating similarity between two data stream windows using the Jacard's coefficient. The problems of estimating rarity and similarity have many applications in mining massive data sets. We present novel, simple algorithms for estimating rarity and similarity on windowed data streams, accurate up to factor 1 卤 驴 using space only logarithmic in the window size.