Mining probabilistically frequent sequential patterns in uncertain databases

  • Authors:
  • Zhou Zhao;Da Yan;Wilfred Ng

  • Affiliations:
  • The Hong Kong University of Science and Technology, Hong Kong;The Hong Kong University of Science and Technology, Hong Kong;The Hong Kong University of Science and Technology, Hong Kong

  • Venue:
  • Proceedings of the 15th International Conference on Extending Database Technology
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data uncertainty is inherent in many real-world applications such as environmental surveillance and mobile tracking. As a result, mining sequential patterns from inaccurate data, such as sensor readings and GPS trajectories, is important for discovering hidden knowledge in such applications. Previous work uses expected support as the measurement of pattern frequentness, which has inherent weaknesses with respect to the underlying probability model, and is therefore ineffective for mining high-quality sequential patterns from uncertain sequence databases. In this paper, we propose to measure pattern frequentness based on the possible world semantics. We establish two uncertain sequence data models abstracted from many real-life applications involving uncertain sequence data, and formulate the problem of mining probabilistically frequent sequential patterns (or p-FSPs) from data that conform to our models. Based on the prefix-projection strategy of the famous PrefixSpan algorithm, we develop two new algorithms, collectively called U-PrefixSpan, for p-FSP mining. U-PrefixSpan effectively avoids the problem of "possible world explosion", and when combined with our three pruning techniques and one validating technique, achieves good performance. The efficiency and effectiveness of U-PrefixSpan are verified through extensive experiments on both real and synthetic datasets.