BIDE-Based parallel mining of frequent closed sequences with mapreduce
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Hi-index | 0.00 |
Frequent serial episodes within an event sequence describe the behavior of users or systems about the application. Existing mining algorithms calculate the frequency of an episode based on overlapping or non-minimal occurrences, which is prone to over-counting the support of long episodes or poorly characterizing the followed-by-closely relationship over event types. In addition, due to utilizing the Apriori-style level wise approach, these algorithms are computationally expensive. In this paper, we propose an efficient algorithm MANEPI (Minimal And Non-overlapping EPIsode) for mining more interesting frequent episodes within the given event sequence. The proposed frequency measure takes both minimal and non-overlapping occurrences of an episode into consideration and ensures better mining quality. The introduced depth first search strategy with the Apriori Property for performing episode growth greatly improves the efficiency of mining long episodes because of scanning the given sequence only once and not generating candidate episodes. Moreover, an optimization technique is presented to narrow down search space and speed up the mining process. Experimental evaluation on both synthetic and real-world datasets demonstrates that our algorithms are more efficient and effective.