Mining probabilistically frequent sequential patterns in uncertain databases

Authors:
Zhou Zhao;Da Yan;Wilfred Ng
Affiliations:
The Hong Kong University of Science and Technology, Hong Kong;The Hong Kong University of Science and Technology, Hong Kong;The Hong Kong University of Science and Technology, Hong Kong
Venue:
Proceedings of the 15th International Conference on Extending Database Technology
Year:
2012

Citing 22
Cited 4

FreeSpan: frequent pattern-projected sequential pattern mining

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Introduction to algorithms

Introduction to algorithms
Mining long sequential patterns in a noisy environment

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
Trio: a system for data, uncertainty, and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Trajectory pattern mining

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Model-driven data acquisition in sensor networks

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Finding frequent items in probabilistic data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A Survey of Uncertain Data Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
Probabilistic Event Extraction from RFID Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Frequent pattern mining with uncertain data

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic frequent itemset mining in uncertain databases

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent itemsets from uncertain data

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Leveraging spatio-temporal redundancy for RFID data cleansing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Probabilistic string similarity joins

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Mining uncertain data with probabilistic guarantees

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Set similarity join on probabilistic data

Proceedings of the VLDB Endowment
Clustering uncertain trajectories

Knowledge and Information Systems
Mining sequential patterns from probabilistic databases

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II

Mining frequent serial episodes over uncertain sequence data

Proceedings of the 16th International Conference on Extending Database Technology
Projection-based partial periodic pattern mining for event sequences

Expert Systems with Applications: An International Journal
Editorial: Pattern-growth based frequent serial episode discovery

Data & Knowledge Engineering
Mining order-preserving submatrices from probabilistic matrices

ACM Transactions on Database Systems (TODS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data uncertainty is inherent in many real-world applications such as environmental surveillance and mobile tracking. As a result, mining sequential patterns from inaccurate data, such as sensor readings and GPS trajectories, is important for discovering hidden knowledge in such applications. Previous work uses expected support as the measurement of pattern frequentness, which has inherent weaknesses with respect to the underlying probability model, and is therefore ineffective for mining high-quality sequential patterns from uncertain sequence databases. In this paper, we propose to measure pattern frequentness based on the possible world semantics. We establish two uncertain sequence data models abstracted from many real-life applications involving uncertain sequence data, and formulate the problem of mining probabilistically frequent sequential patterns (or p-FSPs) from data that conform to our models. Based on the prefix-projection strategy of the famous PrefixSpan algorithm, we develop two new algorithms, collectively called U-PrefixSpan, for p-FSP mining. U-PrefixSpan effectively avoids the problem of "possible world explosion", and when combined with our three pruning techniques and one validating technique, achieves good performance. The efficiency and effectiveness of U-PrefixSpan are verified through extensive experiments on both real and synthetic datasets.