Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Eddies: continuously adaptive query processing
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Dynamic Pipeline Scheduling for Improving Interactive Query Performance
Proceedings of the 27th International Conference on Very Large Data Bases
Chain: operator scheduling for memory minimization in data stream systems
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
State-Space Optimization of ETL Workflows
IEEE Transactions on Knowledge and Data Engineering
Operator scheduling in a data stream manager
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Deciding the physical implementation of ETL workflows
Proceedings of the ACM tenth international workshop on Data warehousing and OLAP
Partition-based workload scheduling in living data warehouse environments
Proceedings of the ACM tenth international workshop on Data warehousing and OLAP
Algorithms and metrics for processing multiple heterogeneous continuous queries
ACM Transactions on Database Systems (TODS)
Data integration flows for business intelligence
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
RiTE: Providing On-Demand Data for Right-Time Data Warehousing
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Scheduling Updates in a Real-Time Stream Warehouse
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
QoX-driven ETL design: reducing the cost of ETL consulting engagements
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Performance Evaluation and Benchmarking
A generic and customizable framework for the design of ETL scenarios
Information Systems - Special issue: The 15th international conference on advanced information systems engineering (CAiSE 2003)
Transaction reordering and grouping for continuous data loading
BIRTE'06 Proceedings of the 1st international conference on Business intelligence for the real-time enterprises
ETLMR: a highly scalable dimensional ETL framework based on mapreduce
DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Data mapper: an operator for expressing one-to-many data transformations
DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
A big data based data storage systems for rock burst experiment
International Journal of Wireless and Mobile Computing
Hi-index | 0.00 |
Extract-transform-load (ETL) workflows model the population of enterprise data warehouses with information gathered from a large variety of heterogeneous data sources. ETL workflows are complex design structures that run under strict performance requirements and their optimization is crucial for satisfying business objectives. In this paper, we deal with the problem of scheduling the execution of ETL activities (a.k.a. transformations, tasks, operations), with the goal of minimizing ETL execution time and allocated memory. We investigate the effects of four scheduling policies on different flow structures and configurations and experimentally show that the use of different scheduling policies may improve ETL performance in terms of memory consumption and execution time. First, we examine a simple, fair scheduling policy. Then, we study the pros and cons of two other policies: the first opts for emptying the largest input queue of the flow and the second for activating the operation (a.k.a. activity) with the maximum tuple consumption rate. Finally, we examine a fourth policy that combines the advantages of the latter two in synergy with flow parallelization.