iSAX 2.0: Indexing and Mining One Billion Time Series

Authors:
Alessandro Camerra;Themis Palpanas;Jin Shieh;Eamonn Keogh
Affiliations:
-;-;-;-
Venue:
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Year:
2010

Citing 0
Cited 9

SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets

Proceedings of the 15th International Conference on Extending Database Technology
Scalable similarity matching in streaming time series

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Revisiting techniques for lowerbounding the dynamic time warping distance

SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications
dbTrento: the data and information management group at the University of Trento

ACM SIGMOD Record
Similarity search for time series based on efficient warping measure

DM-IKM '12 Proceedings of the Data Mining and Intelligent Knowledge Management Workshop
Time series representation: a random shifting perspective

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
A data-adaptive and dynamic segmentation index for whole matching on time series

Proceedings of the VLDB Endowment
Monitoring and diagnosing indicators for business analytics

CASCON '13 Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research
Discovering longest-lasting correlation in sequence databases

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is an increasingly pressing need, by several applications in diverse domains, for developing techniques able to index and mine very large collections of time series. Examples of such applications come from astronomy, biology, the web, and other domains. It is not unusual for these applications to involve numbers of time series in the order of hundreds of millions to billions. However, all relevant techniques that have been proposed in the literature so far have not considered any data collections much larger than one-million time series. In this paper, we describe iSAX 2.0, a data structure designed for indexing and mining truly massive collections of time series. We show that the main bottleneck in mining such massive datasets is the time taken to build the index, and we thus introduce a novel bulk loading mechanism, the first of this kind specifically tailored to a time series index. We show how our method allows mining on datasets that would otherwise be completely untenable, including the first published experiments to index one billion time series, and experiments in mining massive data from domains as diverse as entomology, DNA and web-scale image collections.