Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Approximations and optimal geometric divide-and-conquer
STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
SIAM Journal on Scientific and Statistical Computing
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
On linear-time deterministic algorithms for optimization problems in fixed dimension
Journal of Algorithms
Range queries in OLAP data cubes
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Approximate medians and other quantiles in one pass and with limited memory
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data cube approximation and histograms via wavelets
Proceedings of the seventh international conference on Information and knowledge management
Fast density estimation using CF-kernel for very large databases
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Algorithms for dynamic closest pair and n-body potential fields
Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
Optimal histograms for hierarchical range queries (extended abstract)
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximating multi-dimensional aggregate range queries over real attributes
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
The discrepancy method: randomness and complexity
The discrepancy method: randomness and complexity
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
CRB-Tree: An Efficient Indexing Scheme for Range-Aggregate Queries
ICDT '03 Proceedings of the 9th International Conference on Database Theory
Optimal Histograms with Quality Guarantees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
A Divide-and-Conquer Algorithm for Min-Cost Perfect Matching in the Plane
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Improved Fast Gauss Transform and Efficient Kernel Density Estimation
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
Approximate Processing of Massive Continuous Quantile Queries over High-Speed Data Streams
IEEE Transactions on Knowledge and Data Engineering
Approximation and streaming algorithms for histogram construction problems
ACM Transactions on Database Systems (TODS)
Space- and time-efficient deterministic algorithms for biased quantiles over data streams
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Range Counting over Multidimensional Data Streams
Discrete & Computational Geometry
Deterministic sampling and range counting in geometric data streams
ACM Transactions on Algorithms (TALG)
Feature hashing for large scale multitask learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Hash Kernels for Structured Data
The Journal of Machine Learning Research
Optimal sampling from distributed streams
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sampling based algorithms for quantile computation in sensor networks
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Comparing distributions and shapes using the kernel distance
Proceedings of the twenty-seventh annual symposium on Computational geometry
Geometric Approximation Algorithms
Geometric Approximation Algorithms
Hi-index | 0.00 |
Kernel density estimates are important for a broad variety of applications. Their construction has been well-studied, but existing techniques are expensive on massive datasets and/or only provide heuristic approximations without theoretical guarantees. We propose randomized and deterministic algorithms with quality guarantees which are orders of magnitude more efficient than previous algorithms. Our algorithms do not require knowledge of the kernel or its bandwidth parameter and are easily parallelizable. We demonstrate how to implement our ideas in a centralized setting and in MapReduce, although our algorithms are applicable to any large-scale data processing framework. Extensive experiments on large real datasets demonstrate the quality, efficiency, and scalability of our techniques.