Stream warehousing with DataDepot

Authors:
Lukasz Golab;Theodore Johnson;J. Spencer Seidel;Vladislav Shkapenyuk
Affiliations:
AT&T Laboratories - Research, Florham Park, NJ, USA;AT&T Laboratories - Research, Florham Park, NJ, USA;AT&T Laboratories - Research, Florham Park, NJ, USA;AT&T Laboratories - Research, Florham Park, NJ, USA
Venue:
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Year:
2009

Citing 11
Cited 20

On-line warehouse view maintenance

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Daytona and the fourth-generation language Cymbal

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Extending complex ad-hoc OLAP

Proceedings of the eighth international conference on Information and knowledge management
fAST Refresh using Mass Query Optimization

Proceedings of the 17th International Conference on Data Engineering
Optimizing refresh of a set of materialized views

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Punctuated data streams

Punctuated data streams
Techniques for Warehousing of Sample Data

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Conditional functional dependencies for capturing data inconsistencies

ACM Transactions on Database Systems (TODS)
On generating near-optimal tableaux for conditional functional dependencies

Proceedings of the VLDB Endowment
Scheduling Updates in a Real-Time Stream Warehouse

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Estimating the confidence of conditional functional dependencies

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

Scheduling to minimize staleness and stretch in real-time data warehouses

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Target-based database synchronization

Proceedings of the 2010 ACM Symposium on Applied Computing
Enabling real time data analysis

Proceedings of the VLDB Endowment
Live business intelligence for the real-time enterprise

From active data management to event-based systems and more
Bistro data feed management system

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficiently correlating complex events over live and archived data streams

Proceedings of the 5th ACM international conference on Distributed event-based system
Update propagation in a streaming warehouse

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Event detection over live and archived streams

WAIM'11 Proceedings of the 12th international conference on Web-age information management
UpStream: storage-centric load management for streaming applications with update semantics

The VLDB Journal — The International Journal on Very Large Data Bases
X-HYBRIDJOIN for near-real-time data warehousing

BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
A sequence-oriented stream warehouse paradigm for network monitoring applications

PAM'12 Proceedings of the 13th international conference on Passive and Active Measurement
Towards benchmarking stream data warehouses

Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Data stream warehousing

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Continuous query processing with concurrency control: reading updatable resources consistently

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Making every bit count in wide-area analytics

HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Data stream processing with concurrency control

ACM SIGAPP Applied Computing Review
A generic front-stage for semi-stream processing

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Lazy data structure maintenance for main-memory analytics over sliding windows

Proceedings of the sixteenth international workshop on Data warehousing and OLAP
CRUCIBLE: towards unified secure on- and off-line analytics at scale

DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Efficient bulk updates on multiversion B-trees

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe DataDepot, a tool for generating warehouses from streaming data feeds, such as network-traffic traces, router alerts, financial tickers, transaction logs, and so on. DataDepot is a streaming data warehouse designed to automate the ingestion of streaming data from a wide variety of sources and to maintain complex materialized views over these sources. As a streaming warehouse, DataDepot is similar to Data Stream Management Systems (DSMSs) with its emphasis on temporal data, best-effort consistency, and real-time response. However, as a data warehouse, DataDepot is designed to store tens to hundreds of terabytes of historical data, allow time windows measured in years or decades, and allow both real-time queries on recent data and deep analyses on historical data. In this paper we discuss the DataDepot architecture, with an emphasis on several of its novel and critical features. DataDepot is currently being used for five very large warehousing projects within AT&T; one of these warehouses ingests 500 Mbytes per minute (and is growing). We use these installations to illustrate streaming warehouse use and behavior, and design choices made in developing DataDepot. We conclude with a discussion of DataDepot applications and the efficacy of some optimizations.