Cost-Based Vectorization of Instance-Based Integration Processes

  • Authors:
  • Matthias Boehm;Dirk Habich;Steffen Preissler;Wolfgang Lehner;Uwe Wloka

  • Affiliations:
  • Database Group, Dresden University of Applied Sciences,;Database Technology Group, Dresden University of Technology,;Database Technology Group, Dresden University of Technology,;Database Technology Group, Dresden University of Technology,;Database Group, Dresden University of Applied Sciences,

  • Venue:
  • ADBIS '09 Proceedings of the 13th East European Conference on Advances in Databases and Information Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The inefficiency of integration processes--as an abstraction of workflow-based integration tasks--is often reasoned by low resource utilization and significant waiting times for external systems. With the aim to overcome these problems, we proposed the concept of process vectorization. There, instance-based integration processes are transparently executed with the pipes-and-filters execution model. Here, the term vectorization is used in the sense of processing a sequence (vector) of messages by one standing process. Although it has been shown that process vectorization achieves a significant throughput improvement, this concept has two major drawbacks. First, the theoretical performance of a vectorized integration process mainly depends on the performance of the most cost-intensive operator. Second, the practical performance strongly depends on the number of available threads. In this paper, we present an advanced optimization approach that addresses the mentioned problems. Therefore, we generalize the vectorization problem and explain how to vectorize process plans in a cost-based manner. Due to the exponential complexity, we provide a heuristic computation approach and formally analyze its optimality. In conclusion of our evaluation, the message throughput can be significantly increased compared to both the instance-based execution as well as the rule-based process vectorization.