Estimating the number of segments in time series data using permutation tests

  • Authors:
  • Kari T. Vasko;Hannu T. T. Toivonen

  • Affiliations:
  • -;-

  • Venue:
  • ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Segmentation is a popular technique for discoveringstructure in time series data. We address the largely openproblem of estimating the number of segments that can bereliably discovered. We introduce a novel method for theproblem, called Pete. Pete is based on permutation testing.The problem is an instance of model (dimension) selection.The proposed method analyzes the possible overfitof a model to the available data rather than uses a termfor penalizing model complexity. In this respect the approachis more similar to cross-validation than regulariza-tionbased techniques (e.g., AIC, BIC, MDL, MML). Further,the method produces a p value for each increase in thenumber of segments. This gives the user an overview of thestatistical significance of the segmentations. We evaluatethe performance of the proposed method using both syntheticand real time series data. The experiments show thatpermutation testing gives realistic results about the numberof reliably identifiable segments and that it compares favorablywith the Monte Carlo cross-validation (MCCV) andcommonly used BIC criteria.