Prominent streak discovery in sequence data

Authors:
Xiao Jiang;Chengkai Li;Ping Luo;Min Wang;Yong Yu
Affiliations:
Shanghai Jiao Tong University, Shanghai, China;University of Texas at Arlington, Arlington, TX, USA;HP Labs China, Beijing, China;HP Labs China, Beijing, China;Shanghai Jiao Tong University, Shanghai, China
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 20
Cited 2

Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
On Finding the Maxima of a Set of Vectors

Journal of the ACM (JACM)
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Efficient Retrieval of Similar Time Sequences Under Time Warping

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
The Skyline Operator

Proceedings of the 17th International Conference on Data Engineering
Efficient Progressive Skyline Computation

Proceedings of the 27th International Conference on Very Large Data Bases
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Data mining for early disease outbreak detection

Data mining for early disease outbreak detection
Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

IEEE Transactions on Knowledge and Data Engineering
Progressive skyline computation in database systems

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Refreshing the sky: the compressed skycube with efficient support for frequent updates

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Towards multidimensional subspace skyline analysis

ACM Transactions on Database Systems (TODS)
Shooting stars in the sky: an online algorithm for skyline queries

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Online Interval Skyline Queries on Time Series

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Parametric kernels for sequence data analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Clustering of time series data-a survey

Pattern Recognition
Finding the plateau in an aggregated time series

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management

On "one of the few" objects

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering longest-lasting correlation in sequence databases

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper studies the problem of prominent streak discovery in sequence data. Given a sequence of values, a prominent streak is a long consecutive subsequence consisting of only large (small) values. For finding prominent streaks, we make the observation that prominent streaks are skyline points in two dimensions- streak interval length and minimum value in the interval. Our solution thus hinges upon the idea to separate the two steps in prominent streak discovery' candidate streak generation and skyline operation over candidate streaks. For candidate generation, we propose the concept of local prominent streak (LPS). We prove that prominent streaks are a subset of LPSs and the number of LPSs is less than the length of a data sequence, in comparison with the quadratic number of candidates produced by a brute-force baseline method. We develop efficient algorithms based on the concept of LPS. The non-linear LPS-based method (NLPS) considers a superset of LPSs as candidates, and the linear LPS-based method (LLPS) further guarantees to consider only LPSs. The results of experiments using multiple real datasets verified the effectiveness of the proposed methods and showed orders of magnitude performance improvement against the baseline method.