Implementing recoverable requests using queues
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Algorithms for creating indexes for very large tables without quiescing updates
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Principles of transaction processing: for the systems professional
Principles of transaction processing: for the systems professional
Transaction Processing: Concepts and Techniques
Transaction Processing: Concepts and Techniques
NCR 3700 - The Next-Generation Industrial Database Computer
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
OODB Bulk Loading Revisited: The Partitioned-List Approach
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Fault-tolerant, load-balancing queries in telegraph
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Conceptual modeling for ETL processes
Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP
Lineage Tracing for General Data Warehouse Transformations
Proceedings of the 27th International Conference on Very Large Data Bases
Lineage tracing for general data warehouse transformations
The VLDB Journal — The International Journal on Very Large Data Bases
A declarative approach to optimize bulk loading into databases
ACM Transactions on Database Systems (TODS)
Optimizing ETL Processes in Data Warehouses
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
High-Availability Algorithms for Distributed Stream Processing
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
ETL queues for active data warehousing
Proceedings of the 2nd international workshop on Information quality in information systems
State-Space Optimization of ETL Workflows
IEEE Transactions on Knowledge and Data Engineering
A generic and customizable framework for the design of ETL scenarios
Information Systems - Special issue: The 15th international conference on advanced information systems engineering (CAiSE 2003)
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Stop-and-restart style execution for long running decision support queries
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
PROQID: partial restarts of queries in distributed databases
Proceedings of the 17th ACM conference on Information and knowledge management
Proceedings of the 2005 conference on Software Engineering: Evolution and Emerging Technologies
A generic and customizable framework for the design of ETL scenarios
Information Systems - Special issue: The 15th international conference on advanced information systems engineering (CAiSE 2003)
Towards automated analysis of connections network in distributed stream processing system
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Collecting data streams from a distributed radio-based measurement system
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
R-MESHJOIN for near-real-time data warehousing
DOLAP '10 Proceedings of the ACM 13th international workshop on Data warehousing and OLAP
A latency and fault-tolerance optimizer for online parallel query plans
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Changing flights in mid-air: a model for safely modifying continuous queries
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Update propagation in a streaming warehouse
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
X-HYBRIDJOIN for near-real-time data warehousing
BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
Resumption of data extraction process in parallel data warehouses
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Bulk loading a linear hash file
DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Influence of balancing used in a distributed data warehouse on the extraction process
TEAA'05 Proceedings of the 31st VLDB conference on Trends in Enterprise Application Architecture
HYBRIDJOIN for Near-Real-Time Data Warehousing
International Journal of Data Warehousing and Mining
Hi-index | 0.00 |
Data warehouses collect large quantities of data from distributed sources into a single repository. A typical load to create or maintain a warehouse processes GBs of data, takes hours or even days to execute, and involves many complex and user-defined transformations of the data (e.g., find duplicates, resolve data inconsistencies, and add unique keys). If the load fails, a possible approach is to “redo” the entire load. A better approach is to resume the incomplete load from where it was interrupted. Unfortunately, traditional algorithms for resuming the load either impose unacceptable overhead during normal operation, or rely on the specifics of transformations. We develop a resumption algorithm called DR that imposes no overhead and relies only on the high-level properties of the transformations. We show that DR can lead to a ten-fold reduction in resumption time by performing experiments using commercial software.