Discovering process models with genetic algorithms using sampling

Authors:
Carmen Bratosin;Natalia Sidorova;Wil van der Aalst
Affiliations:
Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands;Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands;Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands
Venue:
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part I
Year:
2010

Citing 13
Cited 1

The power of sampling in knowledge discovery

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Noise, sampling, and efficient genetic algorthms

Noise, sampling, and efficient genetic algorthms
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules

Data Mining and Knowledge Discovery
Genetic Algorithms in Noisy Environments

Machine Learning
Workflow Patterns

Distributed and Parallel Databases
Fitness Inheritance In Multi-objective Optimization

GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference
Workflow Mining: Discovering Process Models from Event Logs

IEEE Transactions on Knowledge and Data Engineering
A comprehensive survey of fitness approximation in evolutionary computation

Soft Computing - A Fusion of Foundations, Methodologies and Applications
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Genetic process mining: an experimental evaluation

Data Mining and Knowledge Discovery
Rediscovering workflow models from event-based data using little thumb

Integrated Computer-Aided Engineering
Process mining applied to the test process of wafer scanners in ASML

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews - Special issue on information reuse and integration
Evolutionary optimization in uncertain environments-a survey

IEEE Transactions on Evolutionary Computation

A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Process mining, a new business intelligence area, aims at discovering process models from event logs. Complex constructs, noise and infrequent behavior are issues that make process mining a complex problem. A genetic mining algorithm, which applies genetic operators to search in the space of all possible process models, deals with the aforementioned challenges with success. Its drawback is high computation time due to the high time costs of the fitness evaluation. Fitness evaluation time linearly depends on the number of process instances in the log. By using a sampling-based approach, i.e. evaluating fitness on a sample from the log instead of the whole log, we drastically reduce the computation time. When the desired fitness is achieved on the sample, we check the fitness on the whole log; if it is not achieved yet, we increase the sample size and continue the computation iteratively. Our experiments show that sampling works well even for relatively small logs, and the total computation time is reduced by 6 up to 15 times.