An adaptive query execution system for data integration
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Efficient and extensible algorithms for multi query optimization
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Pipelining in multi-query optimization
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A Model for Pipelined Query Execution
MASCOTS '93 Proceedings of the International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
Operator scheduling in data stream systems
The VLDB Journal — The International Journal on Very Large Data Bases
Optimizing ETL Processes in Data Warehouses
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
QPipe: a simultaneously pipelined relational query engine
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Real-Time Scheduling for Data Stream Management Systems
ECRTS '05 Proceedings of the 17th Euromicro Conference on Real-Time Systems
Simultaneous Pipelining in QPipe: Exploiting Work Sharing Opportunities Across Queries
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Query optimization over web services
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Control the Flow: How to Safely Compose Streaming Services into Business Processes
SCC '06 Proceedings of the IEEE International Conference on Services Computing
Operator scheduling in a data stream manager
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
QStream: deterministic querying of data streams
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
An approach to optimize data processing in business processes
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Journal of Systems and Software
The Demaq system: declarative development of distributed applications
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Workload-based optimization of integration processes
Proceedings of the 17th ACM conference on Information and knowledge management
Robust Runtime Optimization of Data Transfer in Queries over Web Services
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
DIPBench Toolsuite: A Framework for Benchmarking Integration Systems
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Flexible Multi-Threaded Scheduling for Continuous Queries over Data Streams
ICDEW '07 Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop
Cost-based vectorization of instance-based integration processes
Information Systems
Hi-index | 0.00 |
The inefficiency of integration processes--as an abstraction of workflow-based integration tasks--is often reasoned by low resource utilization and significant waiting times for external systems. With the aim to overcome these problems, we proposed the concept of process vectorization. There, instance-based integration processes are transparently executed with the pipes-and-filters execution model. Here, the term vectorization is used in the sense of processing a sequence (vector) of messages by one standing process. Although it has been shown that process vectorization achieves a significant throughput improvement, this concept has two major drawbacks. First, the theoretical performance of a vectorized integration process mainly depends on the performance of the most cost-intensive operator. Second, the practical performance strongly depends on the number of available threads. In this paper, we present an advanced optimization approach that addresses the mentioned problems. Therefore, we generalize the vectorization problem and explain how to vectorize process plans in a cost-based manner. Due to the exponential complexity, we provide a heuristic computation approach and formally analyze its optimality. In conclusion of our evaluation, the message throughput can be significantly increased compared to both the instance-based execution as well as the rule-based process vectorization.