TSA-Tree: A Wavelet-Based Approach to Improve the Efficiency of Multi-Level Surprise and Trend Queries on Time-Series Data

Authors:
Cyrus Shahabi;Xiaoming Tian;Wugang Zhao
Affiliations:
-;-;-
Venue:
SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
Year:
2000

Citing 15
Cited 33

A Theory for Multiresolution Signal Decomposition: The Wavelet Representation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Introduction to algorithms

Introduction to algorithms
Wavelets: a tutorial in theory and applications

Wavelets: a tutorial in theory and applications
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Similarity-based queries for time series data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficiently supporting ad hoc queries in large datasets of time sequences

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Supporting fast search in time series for movement patterns in multiple scales

Proceedings of the seventh international conference on Information and knowledge management
MALM: a framework for mining sequence database at multiple abstraction levels

Proceedings of the seventh international conference on Information and knowledge management
The lifting scheme: a construction of second generation wavelets

SIAM Journal on Mathematical Analysis
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Approximate Queries and Representations for Large Data Sequences

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Querying Shapes of Histories

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases

A survey on wavelet applications in data mining

ACM SIGKDD Explorations Newsletter
On the need for time series data mining benchmarks: a survey and empirical demonstration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding surprising patterns in a time series database in linear time and space

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration

Data Mining and Knowledge Discovery
A symbolic representation of time series, with implications for streaming algorithms

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Efficient elastic burst detection in data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Online novelty detection on temporal sequences

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Optimizing Similarity Search for Arbitrary Length Time Series Queries

IEEE Transactions on Knowledge and Data Engineering
Novelty detection for short time series with neural networks

Design and application of hybrid intelligent systems
Towards parameter-free data mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Visually mining and monitoring massive time series

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Visualizing and discovering non-trivial patterns in large time series databases

Information Visualization
Compression-based data mining of sequential data

Data Mining and Knowledge Discovery
Experiencing SAX: a novel symbolic representation of time series

Data Mining and Knowledge Discovery
Multi-scale anomaly detection algorithm based on infrequent pattern of time series

Journal of Computational and Applied Mathematics
Efficiently finding unusual shapes in large image databases

Data Mining and Knowledge Discovery
Time series classification based on qualitative space fragmentation

Advanced Engineering Informatics
One class support vector machine for anomaly detection in the communication network performance data

ELECTROSCIENCE'07 Proceedings of the 5th conference on Applied electromagnetics, wireless and optical communications
Finding anomalous periodic time series

Machine Learning
Adaptive burst detection in a stream engine

Proceedings of the 2009 ACM symposium on Applied Computing
Privately detecting bursts in streaming, distributed time series data

Data & Knowledge Engineering
Efficient discovery of unusual patterns in time series

New Generation Computing
Analysis of Time Series Novelty Detection Strategies for Synthetic and Real Data

Neural Processing Letters
Spatial neighborhood based anomaly detection in sensor datasets

Data Mining and Knowledge Discovery
Identification, Modelling and Prediction of Non-periodic Bursts in Workloads

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
A real time hybrid pattern matching scheme for stock time series

ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
Discrete wavelet transform-based time series analysis and mining

ACM Computing Surveys (CSUR)
A review on time series data mining

Engineering Applications of Artificial Intelligence
Finding time series discords based on haar transform

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Distance-Based outliers in sequences

ICDCIT'05 Proceedings of the Second international conference on Distributed Computing and Internet Technology
Adaptive fuzzy clustering based anomaly data detection in energy system of steel industry

Information Sciences: an International Journal
An approach to dimensionality reduction in time series

Information Sciences: an International Journal
Periodic pattern analysis of non-uniformly sampled stock market data

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a novel wavelet-based tree structure, termed TSA-tree, which improves the efficiency of multi-level trend and surprise queries on time sequence data. With the explosion of scientific observation data (some conceptualized as time-sequences), we are facing the challenge of efficiently storing, retrieving and analyzing this data. Frequent queries on this data set are to find trends (e.g., global warming) or surprises (e.g., undersea volcano eruption) within the original time-series. The challenge, however, is that these trend and surprise queries are needed at different levels of abstractions (e.g., within the last week, last month, last year or last decade). To support these multi-level trend and surprise queries, sometimes-huge subset of raw data needs to be retrieved and processed. To expedite this process, we utilize our TSA-tree. Each node of TSA-tree contains pre-computed trends and surprises at different levels. Wavelet transform is used recursively to construct TSA nodes. As a result, each node of TSA tree is readily available for visualization of trends and surprises. In addition, the size of each node is significantly smaller than that of the original time series, resulting in faster I/O operations. However, a limitation of TSA-tree is that its size is larger than the original time series. To address this shortcoming, first we prove that the storage space required to store the optimal subtree of TSA-tree (OTSA-tree) is no more than that required to store the original time-series without losing any information. Next, we propose two alternative techniques to reduce the size of OTSA-tree even further, while maintaining an acceptable query precision as compared to querying the original time sequences. Utilizing real and synthetic time-sequence databases, we compare our techniques with some well-known algorithms such as DFT and SVD in both performance and query precision. The results indicate the superiority of our approach. Finally, we show that our techniques are scalable as we increase either the database size or the length of time sequences.