Disk aware discord discovery: finding unusual time series in terabyte sized datasets

Authors:
Dragomir Yankov;Eamonn Keogh;Umaa Rebbapragada
Affiliations:
University of California, Computer Science and Engineering Department, 92521, Riverside, CA, USA;University of California, Computer Science and Engineering Department, 92521, Riverside, CA, USA;Tufts University, Department of Computer Science, Medford, MA, USA
Venue:
Knowledge and Information Systems
Year:
2008

Citing 22
Cited 8

Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
A cost model for nearest neighbor search in high-dimensional data space

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
The choice of reference points in best-match file searching

Communications of the ACM
Parallel Mining of Outliers in Large Database

Distributed and Parallel Databases
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Mix-nets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous And Discrete Variables

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Mining Deviants in a Time Series Database

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
On the need for time series data mining benchmarks: a survey and empirical demonstration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining distance-based outliers in near linear time with randomization and a simple pruning rule

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic discovery of time series motifs

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Parallel Algorithms for Distance-Based and Density-Based Outliers

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Fast time series classification using numerosity reduction

ICML '06 Proceedings of the 23rd international conference on Machine learning
Mining distance-based outliers from large databases in any metric space

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Time Series for Identifying Unusual Sub-sequences with Applications

ICICIC '06 Proceedings of the First International Conference on Innovative Computing, Information and Control - Volume 1
SAXually Explicit Images: Finding Unusual Shapes

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Hierarchical Agglomerative Clustering Based T-outlier Detection

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Very efficient mining of distance-based outliers

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Detecting distance-based outliers in streams of data

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Finding time series discords based on haar transform

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications

Detection of unique temporal segments by information theoretic meta-clustering

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Data stream anomaly detection through principal subspace tracking

Proceedings of the 2010 ACM Symposium on Applied Computing
HOLMES: an event-driven solution to monitor data centers through continuous queries and machine learning

Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
A review on time series data mining

Engineering Applications of Artificial Intelligence
Faster and parameter-free discord search in quasi-periodic time series

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Time-series data mining

ACM Computing Surveys (CSUR)
CID: an efficient complexity-invariant distance for time series

Data Mining and Knowledge Discovery
Real-time analysis and management of big time-series data

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of finding unusual time series has recently attracted much attention, and several promising methods are now in the literature. However, virtually all proposed methods assume that the data reside in main memory. For many real-world problems this is not be the case. For example, in astronomy, multi-terabyte time series datasets are the norm. Most current algorithms faced with data which cannot fit in main memory resort to multiple scans of the disk /tape and are thus intractable. In this work we show how one particular definition of unusual time series, the time series discord, can be discovered with a disk aware algorithm. The proposed algorithm is exact and requires only two linear scans of the disk with a tiny buffer of main memory. Furthermore, it is very simple to implement. We use the algorithm to provide further evidence of the effectiveness of the discord definition in areas as diverse as astronomy, web query mining, video surveillance, etc., and show the efficiency of our method on datasets which are many orders of magnitude larger than anything else attempted in the literature.