The power of sampling in knowledge discovery
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Noise, sampling, and efficient genetic algorthms
Noise, sampling, and efficient genetic algorthms
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules
Data Mining and Knowledge Discovery
Genetic Algorithms in Noisy Environments
Machine Learning
Distributed and Parallel Databases
Fitness Inheritance In Multi-objective Optimization
GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference
Workflow Mining: Discovering Process Models from Event Logs
IEEE Transactions on Knowledge and Data Engineering
A comprehensive survey of fitness approximation in evolutionary computation
Soft Computing - A Fusion of Foundations, Methodologies and Applications
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Genetic process mining: an experimental evaluation
Data Mining and Knowledge Discovery
Rediscovering workflow models from event-based data using little thumb
Integrated Computer-Aided Engineering
Process mining applied to the test process of wafer scanners in ASML
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews - Special issue on information reuse and integration
Evolutionary optimization in uncertain environments-a survey
IEEE Transactions on Evolutionary Computation
Hi-index | 0.00 |
Process mining, a new business intelligence area, aims at discovering process models from event logs. Complex constructs, noise and infrequent behavior are issues that make process mining a complex problem. A genetic mining algorithm, which applies genetic operators to search in the space of all possible process models, deals with the aforementioned challenges with success. Its drawback is high computation time due to the high time costs of the fitness evaluation. Fitness evaluation time linearly depends on the number of process instances in the log. By using a sampling-based approach, i.e. evaluating fitness on a sample from the log instead of the whole log, we drastically reduce the computation time. When the desired fitness is achieved on the sample, we check the fitness on the whole log; if it is not achieved yet, we increase the sample size and continue the computation iteratively. Our experiments show that sampling works well even for relatively small logs, and the total computation time is reduced by 6 up to 15 times.