Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Tables as a paradigm for querying and restructuring (extended abstract)
PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
NiagaraCQ: a scalable continuous query system for Internet databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Estimating Rarity and Similarity over Data Stream Windows
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Gigascope: a stream database for network applications
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Deterministic sampling and range counting in geometric data streams
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Holistic UDAFs at streaming speeds
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Adaptive sampling for geometric problems over data streams
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Monitoring streams: a new class of data management applications
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
ATLAS: a small but complete SQL extension for data mining and data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Query languages and data models for database sequences and data streams
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A heartbeat mechanism and its application in gigascope
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Adaptive Clustering for Multiple Evolving Streams
IEEE Transactions on Knowledge and Data Engineering
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
A data stream language and system designed for power and extensibility
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A priority random sampling algorithm for time-based sliding windows over weighted streaming data
Proceedings of the 2007 ACM symposium on Applied computing
Priority sampling for estimation of arbitrary subset sums
Journal of the ACM (JACM)
Deterministic algorithms for sampling count data
Data & Knowledge Engineering
Data-centric middleware for context-aware pervasive computing
Pervasive and Mobile Computing
Processing top k queries from samples
CoNEXT '06 Proceedings of the 2006 ACM CoNEXT conference
Confident estimation for multistage measurement sampling and aggregation
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Event dissemination via group-aware stream filtering
Proceedings of the second international conference on Distributed event-based systems
Processing top-k queries from samples
Computer Networks: The International Journal of Computer and Telecommunications Networking
Group-aware stream filtering for bandwidth-efficient data dissemination
International Journal of Parallel, Emergent and Distributed Systems - Best Papers from the WWASN2007 Workshop
Feature-preserved sampling over streaming data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Stream sampling for variance-optimal estimation of subset sums
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Sample synopses for approximate answering of group-by queries
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Towards collaborative data reduction in stream-processing systems
International Journal of Communication Networks and Distributed Systems
Composable, scalable, and accurate weight summarization of unaggregated data sets
Proceedings of the VLDB Endowment
On the variance of subset sum estimation
ESA'07 Proceedings of the 15th annual European conference on Algorithms
A test paradigm for detecting changes in transactional data streams
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
1-pass relative-error Lp-sampling with applications
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Discovery of frequent patterns in transactional data streams
Transactions on large-scale data- and knowledge-centered systems II
Discovery of frequent patterns in transactional data streams
Transactions on large-scale data- and knowledge-centered systems II
Efficient Stream Sampling for Variance-Optimal Estimation of Subset Sums
SIAM Journal on Computing
DAPSS: exact subsequence matching for data streams
DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Streams, security and scalability
DBSec'05 Proceedings of the 19th annual IFIP WG 11.3 working conference on Data and Applications Security
Mining databases and data streams with query languages and rules
KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
Optimizing adaptive multi-route query processing via time-partitioned indices
Journal of Computer and System Sciences
Adaptive stratified reservoir sampling over heterogeneous data streams
Information Systems
Hi-index | 0.00 |
Complex queries over high speed data streams often need to rely on approximations to keep up with their input. The research community has developed a rich literature on approximate streaming algorithms for this application. Many of these algorithms produce samples of the input stream, providing better properties than conventional random sampling. In this paper, we abstract the stream sampling process and design a new stream sample operator. We show how it can be used to implement a wide variety of algorithms that perform sampling and sampling-based aggregations. Also, we show how to implement the operator in Gigascope - a high speed stream database specialized for IP network monitoring applications. As an example study, we apply the operator within such an enhanced Gigascope to perform subset-sum sampling which is of great interest for IP network management. We evaluate this implemention on a live, high speed internet traffic data stream and find that (a) the operator is a flexible, versatile addition to Gigascope suitable for tuning and algorithm engineering, and (b) the operator imposes only a small evaluation overhead. This is the first operational implementation we know of, for a wide variety of stream sampling algorithms at line speed within a data stream management system.