Fast subsequence matching in time-series databases
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Adaptive query processing for time-series data
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Locally adaptive dimensionality reduction for indexing large time series databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Random projection in dimensionality reduction: applications to image and text data
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Approximate Queries and Representations for Large Data Sequences
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Using Signature Files for Querying Time-Series Data
PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Efficient Time Series Matching by Wavelets
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Discovery of climate indices using clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Indexing spatio-temporal trajectories with Chebyshev polynomials
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A Multiresolution Symbolic Representation of Time Series
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Atomic Wedgie: Efficient Query Filtering for Streaming Times Series
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A Bit Level Representation for Time Series Data Mining with Shape Based Similarity
Data Mining and Knowledge Discovery
Fast time series classification using numerosity reduction
ICML '06 Proceedings of the 23rd international conference on Machine learning
Experiencing SAX: a novel symbolic representation of time series
Data Mining and Knowledge Discovery
Indexable PLA for efficient similarity search
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
The TS-tree: efficient time series search and retrieval
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Proceedings of the VLDB Endowment
Distortion-free predictive streaming time-series matching
Information Sciences: an International Journal
A review on time series data mining
Engineering Applications of Artificial Intelligence
ERA: efficient serial and parallel suffix tree construction for very long strings
Proceedings of the VLDB Endowment
Revisiting techniques for lowerbounding the dynamic time warping distance
SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications
Genetic algorithms-based symbolic aggregate approximation
DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Monitoring and diagnosing indicators for business analytics
CASCON '13 Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research
Hi-index | 0.00 |
Current research in indexing and mining time series data has produced many interesting algorithms and representations. However, the algorithms and the size of data considered have generally not been representative of the increasingly massive datasets encountered in science, engineering, and business domains. In this work, we introduce a novel multi-resolution symbolic representation which can be used to index datasets which are several orders of magnitude larger than anything else considered in the literature. To demonstrate the utility of this representation, we constructed a simple tree-based index structure which facilitates fast exact search and orders of magnitude faster, approximate search. For example, with a database of one-hundred million time series, the approximate search can retrieve high quality nearest neighbors in slightly over a second, whereas a sequential scan would take tens of minutes. Our experimental evaluation demonstrates that our representation allows index performance to scale well with increasing dataset sizes. Additionally, we provide analysis concerning parameter sensitivity, approximate search effectiveness, and lower bound comparisons between time series representations in a bit constrained environment. We further show how to exploit the combination of both exact and approximate search as sub-routines in data mining algorithms, allowing for the exact mining of truly massive real world datasets, containing tens of millions of time series.