Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Approximate counting: a detailed analysis
BIT - Ellis Horwood series in artificial intelligence
Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
A linear-time probabilistic counting algorithm for database applications
ACM Transactions on Database Systems (TODS)
VLDB '89 Proceedings of the 15th international conference on Very large data bases
Optimal histograms for limiting worst-case error propagation in the size of join results
ACM Transactions on Database Systems (TODS)
Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Processing queries for first-few answers
CIKM '96 Proceedings of the fifth international conference on Information and knowledge management
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Dynamic generation of discrete random variates
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Approximate data structures with applications
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Counting large numbers of events in small registers
Communications of the ACM
APPROXIMATE: A Query Processor that Produces Monotonically Improving Approximate Answers
IEEE Transactions on Knowledge and Data Engineering
Maintenance of Materialized Views of Sampling Queries
Proceedings of the Eighth International Conference on Data Engineering
Random Sampling from Pseudo-Ranked B+ Trees
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Universality of Serial Histograms
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Recovering Information from Summary Data
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Query processing and optimization in Oracle Rdb
The VLDB Journal — The International Journal on Very Large Data Bases
Wavelet-based histograms for selectivity estimation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data cube approximation and histograms via wavelets
Proceedings of the seventh international conference on Information and knowledge management
Tracking join and self-join sizes in limited storage
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Exact and approximate aggregation in constraint query languages
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximate computation of multidimensional aggregates of sparse data using wavelets
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Synopsis data structures for massive data sets
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Density biased sampling: an improved method for data mining and clustering
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Approximating multi-dimensional aggregate range queries over real attributes
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Congressional samples for approximate answering of group-by queries
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient and flexible value sampling
ACM SIGPLAN Notices
Global optimization of histograms
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Applying the golden rule of sampling for query estimation
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Estimating simple functions on the union of data streams
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Efficient and flexible value sampling
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Probabilistic query models for transaction data
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
New directions in traffic measurement and accounting
IMW '01 Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement
Mining data streams under block evolution
ACM SIGKDD Explorations Newsletter
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Distributed streams algorithms for sliding windows
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Approximate query processing model for mobile computing
Information organization and databases
Fast incremental maintenance of approximate histograms
ACM Transactions on Database Systems (TODS)
Informix under CONTROL: Online Query Processing
Data Mining and Knowledge Discovery
Approximate Query Answering Using Data Warehouse Striping
Journal of Intelligent Information Systems - Special issue on data warehousing and knowledge discovery
Testing properties of directed graphs: acyclicity and connectivity
Random Structures & Algorithms
Parallel frequent set counting
Parallel Computing - Parallel data-intensive algorithms and applications
New directions in traffic measurement and accounting
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Automatic tuning of data synopses
Information Systems - Special issue: Best papers from EDBT 2002
Approximated trial and error analysis in scientific databases
Information Systems - Special issue: Best papers from EDBT 2002
Optimizing Scientific Databases for Client Side Data Processing
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
A Framework for the Physical Design Problem for Data Synopses
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Estimating Range Queries Using Aggregate Data with Integrity Constraints: A Probabilistic Approach
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Aqua: A Fast Decision Support Systems Using Approximate Query Answers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Multi-Dimensional Substring Selectivity Estimation
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Histogram-Based Approximation of Set-Valued Query-Answers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Online Dynamic Reordering for Interactive Data Processing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Offering a Precision-Performance Tradeoff for Aggregation Queries over Replicated Data
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
ICICLES: Self-Tuning Samples for Approximate Query Answering
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries
Proceedings of the 27th International Conference on Very Large Data Bases
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
Approximate Query Processing: Taming the TeraBytes
Proceedings of the 27th International Conference on Very Large Data Bases
On Linear-Spline Based Histograms
WAIM '02 Proceedings of the Third International Conference on Advances in Web-Age Information Management
Approximate Query Answering Using Data Warehouse Striping
DaWaK '01 Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
Time-Interval Sampling for Improved Estimations in Data Warehouses
DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Testing Acyclicity of Directed Graphs in Sublinear Time
ICALP '00 Proceedings of the 27th International Colloquium on Automata, Languages and Programming
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Frequency Estimation of Internet Packet Streams with Limited Space
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
One-dimensional and multi-dimensional substring selectivity estimation
The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases
Approximate query processing using wavelets
The VLDB Journal — The International Journal on Very Large Data Bases
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
One-Pass Wavelet Decompositions of Data Streams
IEEE Transactions on Knowledge and Data Engineering
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Evaluating probabilistic queries over imprecise data
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets
IEEE Transactions on Knowledge and Data Engineering
Identifying frequent items in sliding windows over on-line packet streams
Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
DSQoS-distributed architecture providing QoS in summary warehouses
DOLAP '03 Proceedings of the 6th ACM international workshop on Data warehousing and OLAP
Dynamically maintaining frequent items over a data stream
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Taming the underlying challenges of reliable multihop routing in sensor networks
Proceedings of the 1st international conference on Embedded networked sensor systems
Efficient dynamic mining of constrained frequent sets
ACM Transactions on Database Systems (TODS)
Dependency detection in MobiMine: a systems perspective
Information Sciences—Informatics and Computer Science: An International Journal - special issue: Knowledge discovery from distributed information sources
Querying about the Past, the Present, and the Future in Spatio-Temporal Databases
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Finding frequent items in data streams
Theoretical Computer Science - Special issue on automata, languages and programming
Journal of Intelligent Information Systems
Resource allocation in a middleware for streaming data
MGC '04 Proceedings of the 2nd workshop on Middleware for grid computing
Finding hot query patterns over an XQuery stream
The VLDB Journal — The International Journal on Very Large Data Bases
Language and Compiler Support for Adaptive Applications
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Venn Sampling: A Novel Prediction Technique for Moving Objects
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Finding (Recently) Frequent Items in Distributed Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Approximate counts and quantiles over sliding windows
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Selectivity estimators for multidimensional range queries over real attributes
The VLDB Journal — The International Journal on Very Large Data Bases
Duplicate detection in click streams
WWW '05 Proceedings of the 14th international conference on World Wide Web
What's hot and what's not: tracking most frequent items dynamically
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
A robust system for accurate real-time summaries of internet traffic
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Estimating arbitrary subset sums with few probes
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Proceedings of the 8th ACM international workshop on Data warehousing and OLAP
Maintaining significant stream statistics over sliding windows
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Improving range-sum query evaluation on data cubes via polynomial approximation
Data & Knowledge Engineering
An accuracy-aware compression technique for multidimensional data cubes
Proceedings of the 2006 ACM symposium on Applied computing
A simpler and more efficient deterministic scheme for finding frequent items over sliding windows
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Answering queries using materialized views with minimum size
The VLDB Journal — The International Journal on Very Large Data Bases
A dip in the reservoir: maintaining sample synopses of evolving datasets
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
On biased reservoir sampling in the presence of stream evolution
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
An integrated efficient solution for computing frequent and top-k elements in data streams
ACM Transactions on Database Systems (TODS)
Supporting dynamic migration in tightly coupled grid applications
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Random Sampling for Continuous Streams with Arbitrary Updates
IEEE Transactions on Knowledge and Data Engineering
Mining evolving data streams for frequent patterns
Pattern Recognition
Approximate range---sum query answering on data cubes with probabilistic guarantees
Journal of Intelligent Information Systems
Error minimization in approximate range aggregates
Data & Knowledge Engineering
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Maintaining bernoulli samples over evolving multisets
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sketching unaggregated data streams for subpopulation-size queries
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
GridRod: a dynamic runtime scheduler for grid workflows
Proceedings of the 21st annual international conference on Supercomputing
Dissemination of compressed historical information in sensor networks
The VLDB Journal — The International Journal on Very Large Data Bases
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient exploration of large scientific databases
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
ROLAP implementations of the data cube
ACM Computing Surveys (CSUR)
A geometric approach to monitoring threshold functions over distributed data streams
ACM Transactions on Database Systems (TODS)
Sampling streaming data with replacement
Computational Statistics & Data Analysis
Algorithms and estimators for accurate summarization of internet traffic
Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Robust estimation with sampling and approximate pre-aggregation
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
XWAVE: optimal and approximate extended wavelets
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Model-driven data acquisition in sensor networks
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Answering aggregation queries in a secure system model
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Deterministic algorithms for sampling count data
Data & Knowledge Engineering
Probabilistic lossy counting: an efficient algorithm for finding heavy hitters
ACM SIGCOMM Computer Communication Review
A scalable sampling scheme for clustering in network traffic analysis
Proceedings of the 2nd international conference on Scalable information systems
Intelligent Data Analysis - Knowlegde Discovery from Data Streams
Confident estimation for multistage measurement sampling and aggregation
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Finding frequent items in probabilistic data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Confidence bounds for sampling-based group by estimates
ACM Transactions on Database Systems (TODS)
Memory Efficient Algorithm for Mining Recent Frequent Items in a Stream
RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms
A Probabilistic Framework for Building Privacy-Preserving Synopses of Multi-dimensional Data
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Plot Query Processing with Wavelets
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Tighter estimation using bottom k sketches
Proceedings of the VLDB Endowment
Improving estimation accuracy of aggregate queries on data cubes
Proceedings of the ACM 11th international workshop on Data warehousing and OLAP
On Finding Frequent Elements in a Data Stream
APPROX '07/RANDOM '07 Proceedings of the 10th International Workshop on Approximation and the 11th International Workshop on Randomization, and Combinatorial Optimization. Algorithms and Techniques
Feature-preserved sampling over streaming data
ACM Transactions on Knowledge Discovery from Data (TKDD)
AMID: Approximation of MultI-measured Data using SVD
Information Sciences: an International Journal
Optimal sampling from sliding windows
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Data reduction for data analysis
ECC'08 Proceedings of the 2nd conference on European computing conference
A hardware platform for efficient worm outbreak detection
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Improving estimation accuracy of aggregate queries on data cubes
Data & Knowledge Engineering
Composable, scalable, and accurate weight summarization of unaggregated data sets
Proceedings of the VLDB Endowment
Journal of Intelligent Information Systems
Journal of Network and Computer Applications
Parallel computing for data reduction
AIKED'10 Proceedings of the 9th WSEAS international conference on Artificial intelligence, knowledge engineering and data bases
Metric forensics: a multi-level approach for mining volatile graphs
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining top-K frequent itemsets through progressive sampling
Data Mining and Knowledge Discovery
A parallel algorithm to compute data synopsis
WSEAS Transactions on Information Science and Applications
Gossip-based distribution estimation in peer-to-peer networks
IPTPS'08 Proceedings of the 7th international conference on Peer-to-peer systems
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
PIKM '10 Proceedings of the 3rd workshop on Ph.D. students in information and knowledge management
Distributed frequent items detection on uncertain data
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Finding heavy distinct hitters in data streams
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
DevoFlow: scaling flow management for high-performance networks
Proceedings of the ACM SIGCOMM 2011 conference
The VC-dimension of SQL queries and selectivity estimation through sampling
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Optimal sampling from sliding windows
Journal of Computer and System Sciences
Efficient Stream Sampling for Variance-Optimal Estimation of Subset Sums
SIAM Journal on Computing
Information Sciences: an International Journal
DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Supporting efficient distributed top-k monitoring
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Continuous sampling from distributed streams
Journal of the ACM (JACM)
Toward automated large-scale information integration and discovery
Data Management in a Connected World
A randomized algorithm for finding frequent elements in streams using o(loglogn) space
ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Proceedings of the 15th International Conference on Extending Database Technology
Don't let the negatives bring you down: sampling from streams of signed updates
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
A clustered Dwarf structure to speed up queries on data cubes
DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
Histograms as statistical estimators for aggregate queries
Information Systems
CR-PRECIS: a deterministic summary structure for update data streams
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Metadata for approximate query answering systems
Advances in Software Engineering
Spreader classification based on optimal dynamic bit sharing
IEEE/ACM Transactions on Networking (TON)
Adaptive stratified reservoir sampling over heterogeneous data streams
Information Systems
Mining frequent items in data stream using time fading model
Information Sciences: an International Journal
Non-uniformity issues and workarounds in bounded-size sampling
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
In large data recording and warehousing environments, it is often advantageous to provide fast, approximate answers to queries, whenever possible. Before DBMSs providing highly-accurate approximate answers can become a reality, many new techniques for summarizing data and for estimating answers from summarized data must be developed. This paper introduces two new sampling-based summary statistics, concise samples and counting samples, and presents new techniques for their fast incremental maintenance regardless of the data distribution. We quantify their advantages over standard sample views in terms of the number of additional sample points for the same view size, and hence in providing more accurate query answers. Finally, we consider their application to providing fast approximate answers to hot list queries. Our algorithms maintain their accuracy in the presence of ongoing insertions to the data warehouse.