Removing statistical biases in unsupervised sequence learning

  • Authors:
  • Yoav Horman;Gal A. Kaminka

  • Affiliations:
  • The MAVERICK Group, Department of Computer Science, Bar-Ilan University, Israel;The MAVERICK Group, Department of Computer Science, Bar-Ilan University, Israel

  • Venue:
  • IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Unsupervised sequence learning is important to many applications. A learner is presented with unlabeled sequential data, and must discover sequential patterns that characterize the data. Popular approaches to such learning include statistical analysis and frequency based methods. We empirically compare these approaches and find that both approaches suffer from biases toward shorter sequences, and from inability to group together multiple instances of the same pattern. We provide methods to address these deficiencies, and evaluate them extensively on several synthetic and real-world data sets. The results show significant improvements in all learning methods used.