TSA-Tree: A Wavelet-Based Approach to Improve the Efficiency of Multi-Level Surprise and Trend Queries on Time-Series Data

  • Authors:
  • Cyrus Shahabi;Xiaoming Tian;Wugang Zhao

  • Affiliations:
  • -;-;-

  • Venue:
  • SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce a novel wavelet-based tree structure, termed TSA-tree, which improves the efficiency of multi-level trend and surprise queries on time sequence data. With the explosion of scientific observation data (some conceptualized as time-sequences), we are facing the challenge of efficiently storing, retrieving and analyzing this data. Frequent queries on this data set are to find trends (e.g., global warming) or surprises (e.g., undersea volcano eruption) within the original time-series. The challenge, however, is that these trend and surprise queries are needed at different levels of abstractions (e.g., within the last week, last month, last year or last decade). To support these multi-level trend and surprise queries, sometimes-huge subset of raw data needs to be retrieved and processed. To expedite this process, we utilize our TSA-tree. Each node of TSA-tree contains pre-computed trends and surprises at different levels. Wavelet transform is used recursively to construct TSA nodes. As a result, each node of TSA tree is readily available for visualization of trends and surprises. In addition, the size of each node is significantly smaller than that of the original time series, resulting in faster I/O operations. However, a limitation of TSA-tree is that its size is larger than the original time series. To address this shortcoming, first we prove that the storage space required to store the optimal subtree of TSA-tree (OTSA-tree) is no more than that required to store the original time-series without losing any information. Next, we propose two alternative techniques to reduce the size of OTSA-tree even further, while maintaining an acceptable query precision as compared to querying the original time sequences. Utilizing real and synthetic time-sequence databases, we compare our techniques with some well-known algorithms such as DFT and SVD in both performance and query precision. The results indicate the superiority of our approach. Finally, we show that our techniques are scalable as we increase either the database size or the length of time sequences.