Prominent streak discovery in sequence data

  • Authors:
  • Xiao Jiang;Chengkai Li;Ping Luo;Min Wang;Yong Yu

  • Affiliations:
  • Shanghai Jiao Tong University, Shanghai, China;University of Texas at Arlington, Arlington, TX, USA;HP Labs China, Beijing, China;HP Labs China, Beijing, China;Shanghai Jiao Tong University, Shanghai, China

  • Venue:
  • Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper studies the problem of prominent streak discovery in sequence data. Given a sequence of values, a prominent streak is a long consecutive subsequence consisting of only large (small) values. For finding prominent streaks, we make the observation that prominent streaks are skyline points in two dimensions- streak interval length and minimum value in the interval. Our solution thus hinges upon the idea to separate the two steps in prominent streak discovery' candidate streak generation and skyline operation over candidate streaks. For candidate generation, we propose the concept of local prominent streak (LPS). We prove that prominent streaks are a subset of LPSs and the number of LPSs is less than the length of a data sequence, in comparison with the quadratic number of candidates produced by a brute-force baseline method. We develop efficient algorithms based on the concept of LPS. The non-linear LPS-based method (NLPS) considers a superset of LPSs as candidates, and the linear LPS-based method (LLPS) further guarantees to consider only LPSs. The results of experiments using multiple real datasets verified the effectiveness of the proposed methods and showed orders of magnitude performance improvement against the baseline method.