Projection approaches to process mining using region-based techniques

  • Authors:
  • Josep Carmona

  • Affiliations:
  • Universitat Politècnica de Catalunya, Barcelona, Spain

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traces are everywhere from information systems that store their continuous executions, to any type of health care applications that record each patient's history. The transformation of a set of traces into a mathematical model that can be used for a formal reasoning is therefore of great value. The discovery of process models out of traces is an interesting problem that has received significant attention in the last years. This is a central problem in Process Mining, a novel area which tries to close the cycle between system design and validation, by resorting on methods for the automated discovery, analysis and extension of process models. In this work, algorithms for the derivation of a Petri net from a set of traces are presented. The methods are grounded on the theory of regions, which maps a model in the state-based domain (e.g., an automata) into a model in the event-based domain (e.g., a Petri net). When dealing with large examples, a direct application of the theory of regions will suffer from two problems: one is the state-explosion problem, i.e., the resources required by algorithms that work at the state-level are sometimes prohibitive. This paper introduces decomposition and projection techniques to alleviate the complexity of the region-based algorithms for Petri net discovery, thus extending its applicability to handle large inputs. A second problem is known as the overfitting problem for region-based approaches, which informally means that, in order to represent with high accuracy the trace set, the models obtained are often spaghetti-like. By focusing on special type of processes called conservative and for which an elegant theory and efficient algorithms can be devised, the techniques presented in this paper alleviate the overfitting problem and moreover incorporate structure into the models generated.