Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Continuous queries over data streams
ACM SIGMOD Record
Dynamic Maintenance of Wavelet-Based Histograms
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Characterizing memory requirements for queries over continuous data streams
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Dynamic multidimensional histograms
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Distributed streams algorithms for sliding windows
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Estimating Rarity and Similarity over Data Stream Windows
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Maintaining variance and k-medians over data stream windows
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Clustering Data Streams: Theory and Practice
IEEE Transactions on Knowledge and Data Engineering
Issues in data stream management
ACM SIGMOD Record
Distributed deviation detection in sensor networks
ACM SIGMOD Record
Characterizing memory requirements for queries over continuous data streams
ACM Transactions on Database Systems (TODS)
Online maintenance of very large random samples
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Finding hot query patterns over an XQuery stream
The VLDB Journal — The International Journal on Very Large Data Bases
Approximate counts and quantiles over sliding windows
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sampling search-engine results
WWW '05 Proceedings of the 14th international conference on World Wide Web
SPASS: scalable and energy-efficient data acquisition in sensor databases
Proceedings of the 4th ACM international workshop on Data engineering for wireless and mobile access
Streaming pattern discovery in multiple time-series
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Maintaining significant stream statistics over sliding windows
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Online outlier detection in sensor data using non-parametric models
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A dip in the reservoir: maintaining sample synopses of evolving datasets
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
On biased reservoir sampling in the presence of stream evolution
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A data stream language and system designed for power and extensibility
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Random Sampling for Continuous Streams with Arbitrary Updates
IEEE Transactions on Knowledge and Data Engineering
A priority random sampling algorithm for time-based sliding windows over weighted streaming data
Proceedings of the 2007 ACM symposium on Applied computing
Effective variation management for pseudo periodical streams
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Quality-Aware Sampling and Its Applications in Incremental Data Mining
IEEE Transactions on Knowledge and Data Engineering
Maintaining bernoulli samples over evolving multisets
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Time-decaying sketches for sensor data aggregation
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
A near-optimal algorithm for computing the entropy of a stream
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Incremental learning and concept drift in INTHELEX
Intelligent Data Analysis
Tuple routing strategies for distributed eddies
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Processing sliding window multi-joins in continuous queries over data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Memory-limited execution of windowed stream joins
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Deterministic algorithms for sampling count data
Data & Knowledge Engineering
Sampling time-based sliding windows in bounded space
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Event dissemination via group-aware stream filtering
Proceedings of the second international conference on Distributed event-based systems
Maintaining very large random samples using the geometric file
The VLDB Journal — The International Journal on Very Large Data Bases
Online mining of frequent sets in data streams with error guarantee
Knowledge and Information Systems
Categorized Sliding Window in Streaming Data Management Systems
DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Summarizing Distributed Data Streams for Storage in Data Warehouses
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Group-aware stream filtering for bandwidth-efficient data dissemination
International Journal of Parallel, Emergent and Distributed Systems - Best Papers from the WWASN2007 Workshop
A framework for estimating complex probability density structures in data streams
Proceedings of the 17th ACM conference on Information and knowledge management
Feature-preserved sampling over streaming data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Continuous privacy preserving publishing of data streams
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Towards collaborative data reduction in stream-processing systems
International Journal of Communication Networks and Distributed Systems
Optimal sampling from sliding windows
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Online FCMAC-BYY Model with Sliding Window
ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part II
Brahms: Byzantine resilient random membership sampling
Computer Networks: The International Journal of Computer and Telecommunications Networking
Incremental and Adaptive Clustering Stream Data over Sliding Window
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Change (Detection) You Can Believe in: Finding Distributional Shifts in Data Streams
IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams
Proceedings of the 2010 conference on Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams
A near-optimal algorithm for estimating the entropy of a stream
ACM Transactions on Algorithms (TALG)
A test paradigm for detecting changes in transactional data streams
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Optimal sampling from distributed streams
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fuzzy CMAC with incremental Bayesian Ying-Yang learning and dynamic rule construction
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Approximating sliding windows by cyclic tree-like histograms for efficient range queries
Data & Knowledge Engineering
Time-decaying Sketches for Robust Aggregation of Sensor Data
SIAM Journal on Computing
Effective Computations on Sliding Windows
SIAM Journal on Computing
Semantic aware RSS query algebra
Proceedings of the 12th International Conference on Information Integration and Web-based Applications & Services
Discovery of frequent patterns in transactional data streams
Transactions on large-scale data- and knowledge-centered systems II
Discovery of frequent patterns in transactional data streams
Transactions on large-scale data- and knowledge-centered systems II
Tight bounds for Lp samplers, finding duplicates in streams, and related problems
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Optimal random sampling from distributed streams revisited
DISC'11 Proceedings of the 25th international conference on Distributed computing
Optimal sampling from sliding windows
Journal of Computer and System Sciences
EStream: online mining of frequent sets with precise error guarantee
DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
A simple, yet effective and efficient, sliding window sampling algorithm
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Real-time log analysis using hitachi ucosminexus stream data platform
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
Continuous sampling from distributed streams
Journal of the ACM (JACM)
Phenomenon-aware sensor database systems
EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
Streaming data reduction using low-memory factored representations
Information Sciences: an International Journal
Data stream synopsis using saintetiq
FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
Proceedings of the 15th International Conference on Extending Database Technology
Don't let the negatives bring you down: sampling from streams of signed updates
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Survey: Streaming techniques and data aggregation in networks of tiny artefacts
Computer Science Review
Scalable similarity matching in streaming time series
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Reservoir sampling techniques in modern data analysis
Proceedings of the Fifth Balkan Conference in Informatics
Adaptive processing for continuous query over data stream
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
RSS query algebra: Towards a better news management
Information Sciences: an International Journal
Efficient event detection by exploiting crowds
Proceedings of the 7th ACM international conference on Distributed event-based systems
Pattern discovery in data streams under the time warping distance
The VLDB Journal — The International Journal on Very Large Data Bases
How the live web feels about events
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Counting and sampling triangles from a graph stream
Proceedings of the VLDB Endowment
Adaptive stratified reservoir sampling over heterogeneous data streams
Information Systems
On clustering large number of data streams
Intelligent Data Analysis
Hi-index | 0.00 |
We introduce the problem of sampling from a moving window of recent items from a data stream and develop two algorithms for this problem. The first algorithm, "chain-sample", extends reservoir sampling to deal with the expiration of data elements from the sample. The expected memory usage of our algorithm is O(k) when maintaining a sample of size k over a window of the n most recent elements from the data stream, and with high probability the algorithm requires no more than O(k log n) memory.When the number of elements in the window is variable, as is the case when the size of the window is defined as a time duration rather than as a fixed number of data elements, the sampling problem becomes harder. Our second algorithm, "priority-sample", works even when the number of elements in the window can vary dynamically over time. With high probability, the "priority-sample" algorithm uses no more than O(k log n) memory.