Genetic process mining: an experimental evaluation

  • Authors:
  • A. K. Medeiros;A. J. Weijters;W. M. Aalst

  • Affiliations:
  • Department of Technology Management, Eindhoven University of Technology, Eindhoven, The Netherlands 5600 MB;Department of Technology Management, Eindhoven University of Technology, Eindhoven, The Netherlands 5600 MB;Department of Technology Management, Eindhoven University of Technology, Eindhoven, The Netherlands 5600 MB

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the aims of process mining is to retrieve a process model from an event log. The discovered models can be used as objective starting points during the deployment of process-aware information systems (Dumas et al., eds., Process-Aware Information Systems: Bridging People and Software Through Process Technology. Wiley, New York, 2005) and/or as a feedback mechanism to check prescribed models against enacted ones. However, current techniques have problems when mining processes that contain non-trivial constructs and/or when dealing with the presence of noise in the logs. Most of the problems happen because many current techniques are based on local information in the event log. To overcome these problems, we try to use genetic algorithms to mine process models. The main motivation is to benefit from the global search performed by this kind of algorithms. The non-trivial constructs are tackled by choosing an internal representation that supports them. The problem of noise is naturally tackled by the genetic algorithm because, per definition, these algorithms are robust to noise. The main challenge in a genetic approach is the definition of a good fitness measure because it guides the global search performed by the genetic algorithm. This paper explains how the genetic algorithm works. Experiments with synthetic and real-life logs show that the fitness measure indeed leads to the mining of process models that are complete (can reproduce all the behavior in the log) and precise (do not allow for extra behavior that cannot be derived from the event log). The genetic algorithm is implemented as a plug-in in the ProM framework.