Distributed genetic process mining using sampling

Authors:
Carmen Bratosin;Natalia Sidorova;Wil Van Der Aalst
Affiliations:
Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands;Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands;Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands
Venue:
PaCT'11 Proceedings of the 11th international conference on Parallel computing technologies
Year:
2011

Citing 12
Cited 0

The power of sampling in knowledge discovery

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Genetic Algorithms in Noisy Environments

Machine Learning
Workflow Patterns

Distributed and Parallel Databases
Fitness Inheritance In Multi-objective Optimization

GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference
Workflow Mining: Discovering Process Models from Event Logs

IEEE Transactions on Knowledge and Data Engineering
A comprehensive survey of fitness approximation in evolutionary computation

Soft Computing - A Fusion of Foundations, Methodologies and Applications
Spatially Structured Evolutionary Algorithms: Artificial Evolution in Space and Time (Natural Computing Series)

Spatially Structured Evolutionary Algorithms: Artificial Evolution in Space and Time (Natural Computing Series)
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Genetic process mining: an experimental evaluation

Data Mining and Knowledge Discovery
Rediscovering workflow models from event-based data using little thumb

Integrated Computer-Aided Engineering
Process mining applied to the test process of wafer scanners in ASML

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews - Special issue on information reuse and integration
Evolutionary optimization in uncertain environments-a survey

IEEE Transactions on Evolutionary Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Process mining aims at discovering process models from event logs. Complex constructs, noise and infrequent behavior are issues that make process mining a complex problem. A genetic mining algorithm, which applies genetic operators to search in the space of all possible process models, can successfully deal with the aforementioned challenges. In this paper, we reduce the computation time by using a distributed setting. The population is distributed between the islands of a computer network (e.g. a grid). To further accelerate the method we use sample-based fitness evaluations, i.e. we evaluate the individuals on a sample of the event log instead of the entire event log, gradually increasing the sample size if necessary. Our experiments show that both sampling and distributing the event log significantly improve the performance. The actual speed-up is highly dependent of the combination of the population size and sample size.