The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Fast subsequence matching in time-series databases
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Efficiently supporting ad hoc queries in large datasets of time sequences
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Multidimensional access methods
ACM Computing Surveys (CSUR)
Locally adaptive dimensionality reduction for indexing large time series databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Efficient Similarity Search In Sequence Databases
FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
A large database of graphs and its use for benchmarking graph isomorphism algorithms
Pattern Recognition Letters - Special issue: Graph-based representations in pattern recognition
Haar Wavelets for Efficient Similarity Search of Time-Series: With and Without Time Warping
IEEE Transactions on Knowledge and Data Engineering
Efficient Time Series Matching by Wavelets
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Similarity Search Over Time-Series Data Using Wavelets
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
A symbolic representation of time series, with implications for streaming algorithms
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Online Amnesic Approximation of Streaming Time Series
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Indexing spatio-temporal trajectories with Chebyshev polynomials
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Experiencing SAX: a novel symbolic representation of time series
Data Mining and Knowledge Discovery
Indexable PLA for efficient similarity search
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
The TS-tree: efficient time series search and retrieval
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
iSAX: indexing and mining terabyte sized time series
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
iSAX 2.0: Indexing and Mining One Billion Time Series
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Hi-index | 0.00 |
Time series analysis, as an application for high dimensional data mining, is a common task in biochemistry, meteorology, climate research, bio-medicine or marketing. Similarity search in data with increasing dimensionality results in an exponential growth of the search space, referred to as Curse of Dimensionality. A common approach to postpone this effect is to apply approximation to reduce the dimensionality of the original data prior to indexing. However, approximation involves loss of information, which also leads to an exponential growth of the search space. Therefore, indexing an approximation with a high dimensionality, i. e. high quality, is desirable. We introduce Symbolic Fourier Approximation (SFA) and the SFA trie which allows for indexing of not only large datasets but also high dimensional approximations. This is done by exploiting the trade-off between the quality of the approximation and the degeneration of the index by using a variable number of dimensions to represent each approximation. Our experiments show that SFA combined with the SFA trie can scale up to a factor of 5--10 more indexed dimensions than previous approaches. Thus, it provides lower page accesses and CPU costs by a factor of 2--25 respectively 2--11 for exact similarity search using real world and synthetic data.