Unsupervised mining of long time series based on latent topic model

  • Authors:
  • Jin Wang;Xiangping Sun;Mary F. H. She;Abbas Kouzani;Saeid Nahavandi

  • Affiliations:
  • Institute for Technology Research and Innovation, Deakin University, Geelong, VIC 3217, Australia and Center for Intelligent Systems Research, Deakin University, Geelong, VIC 3217, Australia;Institute for Technology Research and Innovation, Deakin University, Geelong, VIC 3217, Australia;Institute for Technology Research and Innovation, Deakin University, Geelong, VIC 3217, Australia;School of Engineering, Deakin University, Geelong, VIC 3217, Australia;Center for Intelligent Systems Research, Deakin University, Geelong, VIC 3217, Australia

  • Venue:
  • Neurocomputing
  • Year:
  • 2013

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper presents a novel unsupervised method for mining time series based on two generative topic models, i.e., probabilistic Latent Semantic Analysis (pLSA) and Latent Dirichlet Allocation (LDA). The proposed method treats each time series as a text document, and extracts a set of local patterns from the sequence as words by sliding a short temporal window along the sequence. Motivated by the success of latent topic models in text document analysis, latent topic models are extended to find the underlying structure of time series in an unsupervised manner. The clusters or categories of unlabeled time series are automatically discovered by the latent topic models using bag-of-patterns representation. The proposed method was experimentally validated using two sets of time series data extracted from a public Electrocardiography (ECG) database through comparison with the baseline k-means and the Normalized Cuts approaches. In addition, the impact of the bag-of-patterns' parameters was investigated. Experimental results demonstrate that the proposed unsupervised method not only outperforms the baseline k-means and the Normalized Cuts in learning semantic categories of the unlabeled time series, but also is relatively stable with respect to the bag-of-patterns' parameters. To the best of our knowledge, this work is the first attempt to explore latent topic models for unsupervised mining of time series data.