Efficient range-constrained similarity search on wavelet synopses over multiple streams

Authors:
Hao-Ping Hung;Ming-Syan Chen
Affiliations:
National Taiwan University;National Taiwan University
Venue:
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Year:
2006

Citing 16
Cited 4

Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelets for computer graphics: theory and applications

Wavelets for computer graphics: theory and applications
An optimal algorithm for approximate nearest neighbor searching fixed dimensions

Journal of the ACM (JACM)
Supporting subseries nearest neighbor search via approximation

Proceedings of the ninth international conference on Information and knowledge management
Evaluating continuous nearest neighbor queries for streaming time series via pre-fetching

Proceedings of the eleventh international conference on Information and knowledge management
One-Pass Wavelet Decompositions of Data Streams

IEEE Transactions on Knowledge and Data Engineering
Haar Wavelets for Efficient Similarity Search of Time-Series: With and Without Time Warping

IEEE Transactions on Knowledge and Data Engineering
Efficient Time Series Matching by Wavelets

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Similarity Search Over Time-Series Data Using Wavelets

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Conceptual partitioning: an efficient method for continuous nearest neighbor monitoring

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Wavelet synopsis for data streams: minimizing non-euclidean error

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
One-pass wavelet synopses for maximum-error metrics

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Generalized Dimension-Reduction Framework for Recent-Biased Time Series Analysis

IEEE Transactions on Knowledge and Data Engineering
Reverse nearest neighbor aggregates over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Approximate NN queries on streams with guaranteed error/performance bounds

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Approximate Query Processing in Cube Streams

IEEE Transactions on Knowledge and Data Engineering
LeeWave: level-wise distribution of wavelet coefficients for processing kNN queries over distributed streams

Proceedings of the VLDB Endowment
PROUD: a probabilistic approach to processing similarity queries over uncertain data streams

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Probabilistic distance based abnormal pattern detection in uncertain series data

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the resource limitation in the data stream environment, it has been reported that answering user queries according to the wavelet synopsis of a stream is an essential ability of a Data Stream Management System (DSMS). In this paper, motivated by the fact that a user may be interested in an arbitrary range of the data streams, we investigate two important types of range-constrained queries in time series streaming environments: the distance queries (which aim at obtaining the Euclidean distance between two streams) and the kNN queries (which aim at discovering k nearest neighbors to a reference stream). To achieve high efficiency in processing these two types of queries, we propose procedure RED (standing for Range-constrained Euclidean Distance) and algorithm EKS (standing for Enhanced KNN Search). Compared to the existing methods in the prior research, the advantageous features of our approaches are in two folds. First, our approaches are capable of processing the queries directly from the wavelet synopses retained in the main memory without using IDWT to reconstruct the data cells. This feature allows us to save the complexity in both memory and time. Moreover, our approaches enable the users to query the DSMS within their range of interest. Unlike the conventional methods which only support the full-range query processing, this feature will enhance the flexibility at the client side. We evaluate procedure RED and algorithm EKS on live and synthetic datasets empirically and show that the proposed approaches are efficient in similarity search and kNN discovery within arbitrary ranges in the time series streaming environments.