Distributed genetic process mining using sampling

  • Authors:
  • Carmen Bratosin;Natalia Sidorova;Wil Van Der Aalst

  • Affiliations:
  • Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands;Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands;Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands

  • Venue:
  • PaCT'11 Proceedings of the 11th international conference on Parallel computing technologies
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Process mining aims at discovering process models from event logs. Complex constructs, noise and infrequent behavior are issues that make process mining a complex problem. A genetic mining algorithm, which applies genetic operators to search in the space of all possible process models, can successfully deal with the aforementioned challenges. In this paper, we reduce the computation time by using a distributed setting. The population is distributed between the islands of a computer network (e.g. a grid). To further accelerate the method we use sample-based fitness evaluations, i.e. we evaluate the individuals on a sample of the event log instead of the entire event log, gradually increasing the sample size if necessary. Our experiments show that both sampling and distributing the event log significantly improve the performance. The actual speed-up is highly dependent of the combination of the population size and sample size.