DPSP: distributed progressive sequential pattern mining on the cloud

  • Authors:
  • Jen-Wei Huang;Su-Chen Lin;Ming-Syan Chen

  • Affiliations:
  • Yuan Ze University, Taiwan;National Taiwan University, Taiwan;National Taiwan University, Taiwan

  • Venue:
  • PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The progressive sequential pattern mining problem has been discussed in previous research works With the increasing amount of data, single processors struggle to scale up Traditional algorithms running on a single machine may have scalability troubles Therefore, mining progressive sequential patterns intrinsically suffers from the scalability problem In view of this, we design a distributed mining algorithm to address the scalability problem of mining progressive sequential patterns The proposed algorithm DPSP, standing for Distributed Progressive Sequential Pattern mining algorithm, is implemented on top of Hadoop platform, which realizes the cloud computing environment We propose Map/Reduce jobs in DPSP to delete obsolete itemsets, update current candidate sequential patterns and report up-to-date frequent sequential patterns within each POI The experimental results show that DPSP possesses great scalability and consequently increases the performance and the practicability of mining algorithms.