Cost-based vectorization of instance-based integration processes

  • Authors:
  • Matthias Boehm;Dirk Habich;Steffen Preissler;Wolfgang Lehner;Uwe Wloka

  • Affiliations:
  • Dresden University of Technology, Database Technology Group, Noethnitzer Str. 46, 01187 Dresden, Germany;Dresden University of Technology, Database Technology Group, Noethnitzer Str. 46, 01187 Dresden, Germany;Dresden University of Technology, Database Technology Group, Noethnitzer Str. 46, 01187 Dresden, Germany;Dresden University of Technology, Database Technology Group, Noethnitzer Str. 46, 01187 Dresden, Germany;Dresden University of Applied Sciences, Database Group, Friedrich-List-Platz 1, 01069 Dresden, Germany

  • Venue:
  • Information Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Integration processes are workflow-based integration tasks. The inefficiency of these processes is often caused by low resource utilization and significant waiting times for external systems. With the aim to overcome these problems, we proposed the concept of process vectorization. There, instance-based integration processes are transparently executed with the pipes-and-filters execution model. The term vectorization is used in the sense of processing a sequence (vector) of messages by one standing process. Although it has been shown that process vectorization achieves a significant throughput improvement, this concept has two major drawbacks. First, the theoretical performance of a vectorized integration process mainly depends on the performance of the most cost-intensive operator. Second, the practical performance strongly depends on the number of used threads and thus, on the number of operators. In this paper, we present an advanced optimization approach that addresses the mentioned problems. We generalize the vectorization problem and explain how to vectorize process plans in a cost-based manner taking into account the cost of the single operators in the form of their execution time. Due to the exponential time complexity of the exhaustive computation approach, we also provide a heuristic algorithm with linear time complexity. Furthermore, we explain how to apply the general cost-based vectorization to multiple process plans and we discuss the periodical re-optimization. In conclusion of our evaluation, the message throughput can be significantly increased compared to both the instance-based execution as well as the rule-based vectorized execution.