Mining high utility episodes in complex event sequences

  • Authors:
  • Cheng-Wei Wu;Yu-Feng Lin;Philip S. Yu;Vincent S. Tseng

  • Affiliations:
  • Department of Computer Science and Information Engineering, National Cheng Kung University, Taiwan, ROC, Tainan, Taiwan Roc;Department of Computer Science and Information Engineering, National Cheng Kung University, Taiwan, ROC, Tainan, Taiwan Roc;Department of Computer Science, University of Illinois at Chicago, Chicago, Illinois, USA, Chicago, USA;Department of Computer Science and Information Engineering, National Cheng Kung University, Taiwan, ROC, Tainan, Taiwan Roc

  • Venue:
  • Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Frequent episode mining (FEM) is an interesting research topic in data mining with wide range of applications. However, the traditional framework of FEM treats all events as having the same importance/utility and assumes that a same type of event appears at most once at any time point. These simplifying assumptions do not reflect the characteristics of scenarios in real applications and thus the useful information of episodes in terms of utilities such as profits is lost. Furthermore, most studies on FEM focused on mining episodes in simple event sequences and few considered the scenario of complex event sequences, where different events can occur simultaneously. To address these issues, in this paper, we incorporate the concept of utility into episode mining and address a new problem of mining high utility episodes from complex event sequences, which has not been explored so far. In the proposed framework, the importance/utility of different events is considered and multiple events can appear simultaneously. Several novel features are incorporated into the proposed framework to resolve the challenges raised by this new problem, such as the absence of anti-monotone property and the huge set of candidate episodes. Moreover, an efficient algorithm named UP-Span (Utility ePisodes mining by Spanning prefixes) is proposed for mining high utility episodes with several strategies incorporated for pruning the search space to achieve high efficiency. Experimental results on real and synthetic datasets show that UP-Span has excellent performance and serves as an effective solution to the new problem of mining high utility episodes from complex event sequences.