A Study of Quality and Accuracy Trade-offs in Process Mining

  • Authors:
  • Zan Huang;Akhil Kumar

  • Affiliations:
  • Smeal College of Business, Pennsylvania State University, University Park, Pennsylvania 16802;Smeal College of Business, Pennsylvania State University, University Park, Pennsylvania 16802

  • Venue:
  • INFORMS Journal on Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years, many algorithms have been proposed to extract process models from process execution logs. The process models describe the ordering relationships between tasks in a process in terms of standard constructs like sequence, parallel, choice, and loop. Most algorithms assume that each trace in a log represents a correct execution sequence based on a model. In practice, logs are often noisy, and algorithms designed for correct logs are not able to handle noisy logs. In this paper we share our key insights from a study of noise in process logs both real and synthetic. We found that all process logs can be explained by a block-structured model with two special self-loop and optional structures, making it trivial to build a fully accurate process model for any given log, even one with inaccurate data or noise present in it. However, such a model suffers from low quality. By controlling the use of self-loop and optional structures of tasks and blocks of tasks, we can balance the quality and accuracy trade-off to derive high-quality process models that explain a given percentage of traces in the log. Finally, new quality metrics and a novel quality-based algorithm for model extraction from noisy logs are described. The results of the experiments with the algorithm on real and synthetic data are reported and analyzed at length.