Mining frequent serial episodes over uncertain sequence data

Authors:
Li Wan;Ling Chen;Chengqi Zhang
Affiliations:
College Chongqing University, China;University of Technology, Sydney, Australia;University of Technology, Sydney, Australia
Venue:
Proceedings of the 16th International Conference on Extending Database Technology
Year:
2013

Citing 20
Cited 1

Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Density-based clustering of uncertain data

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Discovering Frequent Episodes and Learning Hidden Markov Models: A Formal Connection

IEEE Transactions on Knowledge and Data Engineering
Efficient Clustering of Uncertain Data

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Stream prediction using a generative model based on frequent episodes in event sequences

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A Survey of Uncertain Data Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
A Rule-Based Classification Algorithm for Uncertain Data

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Frequent pattern mining with uncertain data

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic frequent itemset mining in uncertain databases

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Naive Bayes Classification of Uncertain Data

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Mining frequent itemsets from uncertain data

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
A tree-based approach for frequent pattern mining from uncertain data

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Mining uncertain data with probabilistic guarantees

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Accelerating probabilistic frequent itemset mining: a model-based approach

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Approximation of Frequentness Probability of Itemsets in Uncertain Data

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Mining sequential patterns from probabilistic databases

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
A unified view of the apriori-based algorithms for frequent episode discovery

Knowledge and Information Systems
Mining probabilistically frequent sequential patterns in uncertain databases

Proceedings of the 15th International Conference on Extending Database Technology
Mining frequent itemsets over uncertain databases

Proceedings of the VLDB Endowment

Editorial: Pattern-growth based frequent serial episode discovery

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data uncertainty has posed many unique challenges to nearly all types of data mining tasks, creating a need for uncertain data mining. In this paper, we focus on the particular task of mining probabilistic frequent serial episodes (P-FSEs) from uncertain sequence data, which applies to many real applications including sensor readings as well as customer purchase sequences. We first define the notion of P-FSEs, based on the frequentness probabilities of serial episodes under possible world semantics. To discover P-FSEs over an uncertain sequence, we propose: 1) an exact approach that computes the accurate frequentness probabilities of episodes; 2) an approximate approach that approximates the frequency of episodes using probability models; 3) an optimized approach that efficiently prunes a candidate episode by estimating an upper bound of its frequentness probability using approximation techniques. We conduct extensive experiments to evaluate the performance of the developed data mining algorithms. Our experimental results show that: 1) while existing research demonstrates that approximate approaches are orders of magnitudes faster than exact approaches, for P-FSE mining, the efficiency improvement of the approximate approach over the exact approach is marginal; 2) although it has been recognized that the normal distribution based approximation approach is fairly accurate when the data set is large enough, for P-FSE mining, the binomial distribution based approximation achieves higher accuracy when the the number of episode occurrences is limited; 3) the optimized approach clearly outperforms the other two approaches in terms of the runtime, and achieves very high accuracy.