Indexable PLA for efficient similarity search

  • Authors:
  • Qiuxia Chen;Lei Chen;Xiang Lian;Yunhao Liu;Jeffrey Xu Yu

  • Affiliations:
  • Hong Kong University of Science and Technology, Hong Kong, China;Hong Kong University of Science and Technology, Hong Kong, China;Hong Kong University of Science and Technology, Hong Kong, China;Hong Kong University of Science and Technology, Hong Kong, China;The Chinese University of Hong Kong, Hong Kong, China

  • Venue:
  • VLDB '07 Proceedings of the 33rd international conference on Very large data bases
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

Similarity-based search over time-series databases has been a hot research topic for a long history, which is widely used in many applications, including multimedia retrieval, data mining, web search and retrieval, and so on. However, due to high dimensionality (i.e. length) of the time series, the similarity search over directly indexed time series usually encounters a serious problem, known as the "dimensionality curse". Thus, many dimensionality reduction techniques are proposed to break such curse by reducing the dimensionality of time series. Among all the proposed methods, only Piecewise Linear Approximation (PLA) does not have indexing mechanisms to support similarity queries, which prevents it from efficiently searching over very large time-series databases. Our initial studies on the effectiveness of different reduction methods, however, show that PLA performs no worse than others. Motivated by this, in this paper, we re-investigate PLA for approximating and indexing time series. Specifically, we propose a novel distance function in the reduced PLA-space, and prove that this function indeed results in a lower bound of the Euclidean distance between the original time series, which can lead to no false dismissals during the similarity search. As a second step, we develop an effective approach to index these lower bounds to improve the search efficiency. Our extensive experiments over a wide spectrum of real and synthetic data sets have demonstrated the efficiency and effectiveness of PLA together with the newly proposed lower bound distance, in terms of both pruning power and wall clock time, compared with two state-of-the-art reduction methods, Adaptive Piecewise Constant Approximation (APCA) and Chebyshev Polynomials (CP).