Estimating the number of segments in time series data using permutation tests

Authors:
Kari T. Vasko;Hannu T. T. Toivonen
Affiliations:
-;-
Venue:
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Year:
2002

Citing 0
Cited 11

Clock synchronization for internet measurements: a clustering algorithm

Computer Networks: The International Journal of Computer and Telecommunications Networking
Learning States and Rules for Detecting Anomalies in Time Series

Applied Intelligence
Signal segmentation and modelling based on equipartition principle

DSP'09 Proceedings of the 16th international conference on Digital Signal Processing
Modified Gath--Geva clustering for fuzzy segmentation of multivariate time-series

Fuzzy Sets and Systems
Running damage extraction technique for identifying fatigue damaging events

WSEAS Transactions on Mathematics
Evaluation of BIC and Cross Validation for model selection on sequence segmentations

International Journal of Data Mining and Bioinformatics
Weighted and constrained possibilistic C-means clustering for online fault detection and isolation

Applied Intelligence
A study of modelling non-stationary time series using support vector machines with fuzzy segmentation information

CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
Correlation based dynamic time warping of multivariate time series

Expert Systems with Applications: An International Journal
Time-series data mining

ACM Computing Surveys (CSUR)
Estimating the predominant number of clusters in a dataset

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Segmentation is a popular technique for discoveringstructure in time series data. We address the largely openproblem of estimating the number of segments that can bereliably discovered. We introduce a novel method for theproblem, called Pete. Pete is based on permutation testing.The problem is an instance of model (dimension) selection.The proposed method analyzes the possible overfitof a model to the available data rather than uses a termfor penalizing model complexity. In this respect the approachis more similar to cross-validation than regulariza-tionbased techniques (e.g., AIC, BIC, MDL, MML). Further,the method produces a p value for each increase in thenumber of segments. This gives the user an overview of thestatistical significance of the segmentations. We evaluatethe performance of the proposed method using both syntheticand real time series data. The experiments show thatpermutation testing gives realistic results about the numberof reliably identifiable segments and that it compares favorablywith the Monte Carlo cross-validation (MCCV) andcommonly used BIC criteria.