CMRules: Mining sequential rules common to several sequences

  • Authors:
  • Philippe Fournier-Viger;Usef Faghihi;Roger Nkambou;Engelbert Mephu Nguifo

  • Affiliations:
  • Department of Computer Science and Information Engineering, National Cheng Kung University, Taiwan;Department of Computer Science, University of Memphis, TN, USA;Department of Computer Science, University of Quebec at Montreal, 201, avenue Président-Kennedy, Montréal, Canada;Clermont Université, Université Blaise Pascal, LIMOS, BP 10448, F-63000 Clermont-Ferrand, France and CNRS, UMR 6158, LIMOS, F-63173 Aubiere, France

  • Venue:
  • Knowledge-Based Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sequential rule mining is an important data mining task used in a wide range of applications. However, current algorithms for discovering sequential rules common to several sequences use very restrictive definitions of sequential rules, which make them unable to recognize that similar rules can describe a same phenomenon. This can have many undesirable effects such as (1) similar rules that are rated differently, (2) rules that are not found because they are considered uninteresting when taken individually, (3) and rules that are too specific, which makes them less likely to be used for making predictions. In this paper, we address these problems by proposing a more general form of sequential rules such that items in the antecedent and in the consequent of each rule are unordered. We propose an algorithm named CMRules for mining this form of rules. The algorithm proceeds by first finding association rules to prune the search space for items that occur jointly in many sequences. Then it eliminates association rules that do not meet the minimum confidence and support thresholds according to the sequential ordering. We evaluate the performance of CMRules in three different ways. First, we provide an analysis of its time complexity. Second, we compare its performance (in terms of execution time, memory usage and scalability) with an adaptation of an algorithm from the literature that we name CMDeo. For this comparison, we use three real-life public datasets, which have different characteristics and represent three kinds of data. In many cases, results show that CMRules is faster and has a better scalability for low support thresholds than CMDeo. Lastly, we report a successful application of the algorithm in a tutoring agent.