The power of sampling in knowledge discovery
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Genetic Algorithms in Noisy Environments
Machine Learning
Distributed and Parallel Databases
Fitness Inheritance In Multi-objective Optimization
GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference
Workflow Mining: Discovering Process Models from Event Logs
IEEE Transactions on Knowledge and Data Engineering
A comprehensive survey of fitness approximation in evolutionary computation
Soft Computing - A Fusion of Foundations, Methodologies and Applications
Spatially Structured Evolutionary Algorithms: Artificial Evolution in Space and Time (Natural Computing Series)
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Genetic process mining: an experimental evaluation
Data Mining and Knowledge Discovery
Rediscovering workflow models from event-based data using little thumb
Integrated Computer-Aided Engineering
Process mining applied to the test process of wafer scanners in ASML
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews - Special issue on information reuse and integration
Evolutionary optimization in uncertain environments-a survey
IEEE Transactions on Evolutionary Computation
Hi-index | 0.00 |
Process mining aims at discovering process models from event logs. Complex constructs, noise and infrequent behavior are issues that make process mining a complex problem. A genetic mining algorithm, which applies genetic operators to search in the space of all possible process models, can successfully deal with the aforementioned challenges. In this paper, we reduce the computation time by using a distributed setting. The population is distributed between the islands of a computer network (e.g. a grid). To further accelerate the method we use sample-based fitness evaluations, i.e. we evaluate the individuals on a sample of the event log instead of the entire event log, gradually increasing the sample size if necessary. Our experiments show that both sampling and distributing the event log significantly improve the performance. The actual speed-up is highly dependent of the combination of the population size and sample size.