e-NSP: efficient negative sequential pattern mining based on identified positive patterns without database rescanning

  • Authors:
  • Xiangjun Dong;Zhigang Zheng;Longbing Cao;Yanchang Zhao;Chengqi Zhang;Jinjiu Li;Wei Wei;Yuming Ou

  • Affiliations:
  • Shandong Polytechnic University, Jinan, China;Universitiy of Technology, Sydney, Sydney, Australia;Universitiy of Technology, Sydney, Sydney, Australia;Centrelink, Sydney, Australia;Universitiy of Technology, Sydney, Sydney, Australia;Universitiy of Technology, Sydney, Sydney, Australia;Universitiy of Technology, Sydney, Sydney, Australia;Universitiy of Technology, Sydney, Sydney, Australia

  • Venue:
  • Proceedings of the 20th ACM international conference on Information and knowledge management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mining Negative Sequential Patterns (NSP) is much more challenging than mining Positive Sequential Patterns (PSP) due to the high computational complexity and huge search space required in calculating Negative Sequential Candidates (NSC). Very few approaches are available for mining NSP, which mainly rely on re-scanning databases after identifying PSP. As a result, they are very inefficient. In this paper, we propose an efficient algorithm for mining NSP, called e-NSP, which mines for NSP by only involving the identified PSP, without re-scanning databases. First, negative containment is defined to determine whether or not a data sequence contains a negative sequence. Second, an efficient approach is proposed to convert the negative containment problem to a positive containment problem. The supports of NSC are then calculated based only on the corresponding PSP. Finally, a simple but efficient approach is proposed to generate NSC. With e-NSP, mining NSP does not require additional database scans, and the existing PSP mining algorithms can be integrated into e-NSP to mine for NSP efficiently. e-NSP is compared with two currently available NSP mining algorithms on 14 synthetic and real-life datasets. Intensive experiments show that e-NSP takes as little as 3% of the runtime of the baseline approaches and is applicable for efficient mining of NSP in large datasets.