Efficient Mining of Gap-Constrained Subsequences and Its Various Applications

  • Authors:
  • Chun Li;Qingyan Yang;Jianyong Wang;Ming Li

  • Affiliations:
  • Tsinghua University;Tsinghua University;Tsinghua University;Tsinghua University

  • Venue:
  • ACM Transactions on Knowledge Discovery from Data (TKDD)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mining frequent subsequence patterns is a typical data-mining problem and various efficient sequential pattern mining algorithms have been proposed. In many application domains (e.g., biology), the frequent subsequences confined by the predefined gap requirements are more meaningful than the general sequential patterns. In this article, we propose two algorithms, Gap-BIDE for mining closed gap-constrained subsequences from a set of input sequences, and Gap-Connect for mining repetitive gap-constrained subsequences from a single input sequence. Inspired by some state-of-the-art closed or constrained sequential pattern mining algorithms, the Gap-BIDE algorithm adopts an efficient approach to finding the complete set of closed sequential patterns with gap constraints, while the Gap-Connect algorithm efficiently mines an approximate set of long patterns by connecting short patterns. We also present several methods for feature selection from the set of gap-constrained patterns for the purpose of classification and clustering. Our extensive performance study shows that our approaches are very efficient in mining frequent subsequences with gap constraints, and the gap-constrained pattern based classification/clustering approaches can achieve high-quality results.