RiTE: Providing On-Demand Data for Right-Time Data Warehousing

Authors:
Christian Thomsen;Torben Bach Pedersen;Wolfgang Lehner
Affiliations:
Dep. of Computer Science, Aalborg University, Aalborg, Denmark. chr@cs.aau.dk;Dep. of Computer Science, Aalborg University, Aalborg, Denmark. tbp@cs.aau.dk;Dep. of Computer Science, Dresden University of Technology, Dresden, Germany. wolfgang.lehner@tu-dresden.de
Venue:
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Year:
2008

Citing 0
Cited 10

Data integration flows for business intelligence

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Automating the loading of business process data warehouses

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Object-extended OLAP querying

Data & Knowledge Engineering
QoX-driven ETL design: reducing the cost of ETL consulting engagements

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A taxonomy of ETL activities

Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP
LazyBase: trading freshness for performance in a scalable database

Proceedings of the 7th ACM european conference on Computer Systems
Towards benchmarking stream data warehouses

Proceedings of the fifteenth international workshop on Data warehousing and OLAP
BPMN-based conceptual modeling of ETL processes

DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Daisy: the center for data-intensive systems at Aalborg University

ACM SIGMOD Record
Scheduling strategies for efficient ETL execution

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data warehouses (DWs) have traditionally been loaded with data at regular time intervals, e.g., monthly, weekly, or daily, using fast bulk loading techniques. Recently, the trend is to insert all (or only some) new source data very quickly into DWs, called near-realtime DWs (right-time DWs). This is done using regular INSERT statements, resulting in too low insert speeds. There is thus a great need for a solution that makes inserted data available quickly, while still providing bulk-load insert speeds. This paper presents RiTE ("Right-Time ETL"), a middleware system that provides exactly that. A data producer (ETL) can insert data that becomes available to data consumers on demand. RiTE includes an innovative main-memory based catalyst that provides fast storage and offers concurrency control. A number of policies controlling the bulk movement of data based on user requirements for persistency, availability, freshness, etc. are supported. The system works transparently to both producer and consumers. The system is integrated with an open source DBMS, and experiments show that it provides "the best of both worlds", i.e., INSERT-like data availability, but with bulk-load speeds (up to 10 times faster).