Indexable PLA for efficient similarity search

Authors:
Qiuxia Chen;Lei Chen;Xiang Lian;Yunhao Liu;Jeffrey Xu Yu
Affiliations:
Hong Kong University of Science and Technology, Hong Kong, China;Hong Kong University of Science and Technology, Hong Kong, China;Hong Kong University of Science and Technology, Hong Kong, China;Hong Kong University of Science and Technology, Hong Kong, China;The Chinese University of Hong Kong, Hong Kong, China
Venue:
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Year:
2007

Citing 25
Cited 20

Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Efficiently supporting ad hoc queries in large datasets of time sequences

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Finding patterns in time series: a dynamic programming approach

Advances in knowledge discovery and data mining
Dimensionality reduction for similarity searching in dynamic databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Distance browsing in spatial databases

ACM Transactions on Database Systems (TODS)
A comparison of DFT and DWT based similarity search in time-series databases

Proceedings of the ninth international conference on Information and knowledge management
Locally adaptive dimensionality reduction for indexing large time series databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Variable Length Queries for Time Series Data

Proceedings of the 17th International Conference on Data Engineering
Fast Time Sequence Indexing for Arbitrary Lp Norms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Querying Shapes of Histories

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Warping indexes with envelope transforms for query by humming

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Gigascope: a stream database for network applications

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient elastic burst detection in data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Online Amnesic Approximation of Streaming Time Series

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Online event-driven subsequence matching over financial data streams

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Indexing spatio-temporal trajectories with Chebyshev polynomials

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Robust and fast similarity search for moving object trajectories

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Contour map matching for event detection in sensor networks

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Indexing Multidimensional Time-Series

The VLDB Journal — The International Journal on Very Large Data Bases
An efficient and accurate method for evaluating time series similarity

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Trajectory clustering: a partition-and-group framework

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
StatStream: statistical monitoring of thousands of data streams in real time

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
On the marriage of Lp-norms and edit distance

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Querying and mining of time series data: experimental comparison of representations and distance measures

Proceedings of the VLDB Endowment
Finding Motifs of Financial Data Streams in Real Time

ISICA '08 Proceedings of the 3rd International Symposium on Advances in Computation and Intelligence
iSAX: disk-aware mining and indexing of massive time series datasets

Data Mining and Knowledge Discovery
Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data

The VLDB Journal — The International Journal on Very Large Data Bases
GAMPS: compressing multi sensor data by grouping and amplitude scaling

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Top-k queries on temporal data

The VLDB Journal — The International Journal on Very Large Data Bases
An efficient approach for human motion data mining based on curves matching

ICCVG'10 Proceedings of the 2010 international conference on Computer vision and graphics: Part I
A framework for time-series analysis

AIMSA'10 Proceedings of the 14th international conference on Artificial intelligence: methodology, systems, and applications
A review on time series data mining

Engineering Applications of Artificial Intelligence
Scalable kNN search on vertically stored time series

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Approximate query on historical stream data

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
Quick identification of near-duplicate video sequences with cut signature

World Wide Web
SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets

Proceedings of the 15th International Conference on Extending Database Technology
Ranking large temporal data

Proceedings of the VLDB Endowment
Time-series data mining

ACM Computing Surveys (CSUR)
Experimental comparison of representation methods and distance measures for time series data

Data Mining and Knowledge Discovery
A representation of time series based on implicit polynomial curve

Pattern Recognition Letters
Optimal splitters for temporal and multi-version databases

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
A data-adaptive and dynamic segmentation index for whole matching on time series

Proceedings of the VLDB Endowment
A new similarity measure based on shape information for invariant with multiple distortions

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Similarity-based search over time-series databases has been a hot research topic for a long history, which is widely used in many applications, including multimedia retrieval, data mining, web search and retrieval, and so on. However, due to high dimensionality (i.e. length) of the time series, the similarity search over directly indexed time series usually encounters a serious problem, known as the "dimensionality curse". Thus, many dimensionality reduction techniques are proposed to break such curse by reducing the dimensionality of time series. Among all the proposed methods, only Piecewise Linear Approximation (PLA) does not have indexing mechanisms to support similarity queries, which prevents it from efficiently searching over very large time-series databases. Our initial studies on the effectiveness of different reduction methods, however, show that PLA performs no worse than others. Motivated by this, in this paper, we re-investigate PLA for approximating and indexing time series. Specifically, we propose a novel distance function in the reduced PLA-space, and prove that this function indeed results in a lower bound of the Euclidean distance between the original time series, which can lead to no false dismissals during the similarity search. As a second step, we develop an effective approach to index these lower bounds to improve the search efficiency. Our extensive experiments over a wide spectrum of real and synthetic data sets have demonstrated the efficiency and effectiveness of PLA together with the newly proposed lower bound distance, in terms of both pruning power and wall clock time, compared with two state-of-the-art reduction methods, Adaptive Piecewise Constant Approximation (APCA) and Chebyshev Polynomials (CP).