Approximate NN queries on streams with guaranteed error/performance bounds

Authors:
Nick Koudas;Beng Chin Ooi;Kian-Lee Tan;Rui Zhang
Affiliations:
AT&T Labs-Research;National University of Singapore, Singapore;National University of Singapore, Singapore;National University of Singapore, Singapore
Venue:
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Year:
2004

Citing 22
Cited 31

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Approximate closest-point queries in high dimensions

Information Processing Letters
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient search for approximate nearest neighbor in high dimensional spaces

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
An optimal algorithm for approximate nearest neighbor searching fixed dimensions

Journal of the ACM (JACM)
An optimal algorithm for approximate nearest neighbor searching

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Continually evaluating similarity-based pattern queries on a streaming time series

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A class of data structures for associative searching

PODS '84 Proceedings of the 3rd ACM SIGACT-SIGMOD symposium on Principles of database systems
Clustering for Approximate Similarity Search in High-Dimensional Spaces

IEEE Transactions on Knowledge and Data Engineering
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Contrast Plots and P-Sphere Trees: Space vs. Time in Nearest Neighbour Searches

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Indexing the Distance: An Efficient Method to KNN Processing

Proceedings of the 27th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
The Universal B-Tree for Multidimensional Indexing: general Concepts

WWCA '97 Proceedings of the International Conference on Worldwide Computing and Its Applications
Ranking in Spatial Databases

SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
Making the Pyramid Technique Robust to Query Types and Workloads

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Comparing data streams using Hamming norms (how to zero in)

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Reverse nearest neighbor aggregates over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A regression-based temporal pattern mining scheme for data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Stabbing the Sky: Efficient Skyline Computation over Sliding Windows

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
BRAID: stream mining through group lag correlations

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Conceptual partitioning: an efficient method for continuous nearest neighbor monitoring

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
Adaptive stream filters for entity-based queries with non-value tolerance

VLDB '05 Proceedings of the 31st international conference on Very large data bases
KLEE: a framework for distributed top-k query algorithms

VLDB '05 Proceedings of the 31st international conference on Very large data bases
A Threshold-Based Algorithm for Continuous Monitoring of k Nearest Neighbors

IEEE Transactions on Knowledge and Data Engineering
Continuous monitoring of top-k queries over sliding windows

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Efficient range-constrained similarity search on wavelet synopses over multiple streams

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Continuous Nearest Neighbor Queries over Sliding Windows

IEEE Transactions on Knowledge and Data Engineering
Efficient Process of Top-k Range-Sum Queries over Multiple Streams with Minimized Global Error

IEEE Transactions on Knowledge and Data Engineering
Best position algorithms for top-k queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Querying time-series streams

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Categorical skylines for streaming data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Top-k/w publish/subscribe: finding k most relevant publications in sliding time window w

Proceedings of the second international conference on Distributed event-based systems
Continuous Spatiotemporal Trajectory Joins

GeoSensor Networks
Efficiently Monitoring Nearest Neighbors to a Moving Object

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
LeeWave: level-wise distribution of wavelet coefficients for processing kNN queries over distributed streams

Proceedings of the VLDB Endowment
Continuous proximity monitoring in road networks

Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems
Evaluating probability threshold k-nearest-neighbor queries over uncertain data

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Distributed top-k aggregation queries at large

Distributed and Parallel Databases
Evaluating top-k queries over incomplete data streams

Proceedings of the 18th ACM conference on Information and knowledge management
Enhancing the B+-tree by dynamic node popularity caching

Information Processing Letters
Continuous monitoring of exclusive closest pairs

SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases
Continuous medoid queries over moving objects

SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases
The gist of everything new: personalized top-k processing over web 2.0 streams

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Best position algorithms for efficient top-k query processing

Information Systems
Distributed processing of continuous sliding-window k-NN queries for data stream filtering

World Wide Web
Shared execution strategy for neighbor-based pattern mining requests over streaming windows

ACM Transactions on Database Systems (TODS)
A platform for situational awareness in operational BI

Decision Support Systems
Top-k/w publish/subscribe: A publish/subscribe model for continuous top-k processing over data streams

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In data stream applications, data arrive continuously and can only be scanned once as the query processor has very limited memory (relative to the size of the stream) to work with. Hence, queries on data streams do not have access to the entire data set and query answers are typically approximate. While there have been many studies on the k Nearest Neighbors (kNN) problem in conventional multi-dimensional databases, the solutions cannot be directly applied to data streams for the above reasons. In this paper, we investigate the kNN problem over data streams. We first introduce the e-approximate kNN (ekNN) problem that finds the approximate kNN answers of a query point Q such that the absolute error of the k-th nearest neighbor distance is bounded by e. To support ekNN queries over streams, we propose a technique called DISC (aDaptive Indexing on Streams by space-filling Curves). DISC can adapt to different data distributions to either (a) optimize memory utilization to answer ekNN queries under certain accuracy requirements or (b) achieve the best accuracy under a given memory constraint. At the same time, DISC provide efficient updates and query processing which are important requirements in data stream applications. Extensive experiments were conducted using both synthetic and real data sets and the results confirm the effectiveness and efficiency of DISC.