A method for determination on HMM distance threshold

Authors:
Jiangjiao Duan;Jianping Zeng;Dongzhan Zhang
Affiliations:
Department of Computer Science, Xiamen University, Xiamen, P.R.China;School of Computer Science, Fudan University, Shanghai, P.R.China;Department of Computer Science, Xiamen University, Xiamen, .R.China
Venue:
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Year:
2009

Citing 4
Cited 0

A Hidden Markov Model-Based Approach to Sequential Data Clustering

Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
A unified framework for model-based clustering

The Journal of Machine Learning Research
Incorporating with Recursive Model Training in Time Series Clustering

CIT '05 Proceedings of the The Fifth International Conference on Computer and Information Technology
A prediction algorithm for time series based on adaptive model selection

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hidden Markov model (HMM) is widely used in time series modeling. Usually, it is necessarily to calculate the sequence's likelihood w.r.t. HMM to evaluate the similarity between the sequence and the HMM. Hence, it is required to provide a method to select a best threshold value that can determine whether the sequence is well approximated by the model or not. However, this process is usually done manually. Here, we provide a method (HTDM) to determine the threshold automatically. Based on likelihood statistic, we conclude that the likelihood is subjected to normal distribution, and then standard deviation of the distribution is estimated. Hence, the distance threshold value can be achieved based on the rule of "three sigma". In the experiment, we make performance comparison between the HMM-based hierarchical clustering algorithm HHCH using HTDM, and algorithm HBHCTS in which threshold is set by manual. Experiment results show that the proposed method is effective on both syntax dataset and real world dataset.