On-line warehouse view maintenance
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Daytona and the fourth-generation language Cymbal
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Proceedings of the eighth international conference on Information and knowledge management
fAST Refresh using Mass Query Optimization
Proceedings of the 17th International Conference on Data Engineering
Optimizing refresh of a set of materialized views
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Punctuated data streams
Techniques for Warehousing of Sample Data
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Conditional functional dependencies for capturing data inconsistencies
ACM Transactions on Database Systems (TODS)
On generating near-optimal tableaux for conditional functional dependencies
Proceedings of the VLDB Endowment
Scheduling Updates in a Real-Time Stream Warehouse
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Estimating the confidence of conditional functional dependencies
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Scheduling to minimize staleness and stretch in real-time data warehouses
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Target-based database synchronization
Proceedings of the 2010 ACM Symposium on Applied Computing
Enabling real time data analysis
Proceedings of the VLDB Endowment
Live business intelligence for the real-time enterprise
From active data management to event-based systems and more
Bistro data feed management system
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficiently correlating complex events over live and archived data streams
Proceedings of the 5th ACM international conference on Distributed event-based system
Update propagation in a streaming warehouse
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Event detection over live and archived streams
WAIM'11 Proceedings of the 12th international conference on Web-age information management
UpStream: storage-centric load management for streaming applications with update semantics
The VLDB Journal — The International Journal on Very Large Data Bases
X-HYBRIDJOIN for near-real-time data warehousing
BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
A sequence-oriented stream warehouse paradigm for network monitoring applications
PAM'12 Proceedings of the 13th international conference on Passive and Active Measurement
Towards benchmarking stream data warehouses
Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Continuous query processing with concurrency control: reading updatable resources consistently
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Making every bit count in wide-area analytics
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Data stream processing with concurrency control
ACM SIGAPP Applied Computing Review
A generic front-stage for semi-stream processing
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Lazy data structure maintenance for main-memory analytics over sliding windows
Proceedings of the sixteenth international workshop on Data warehousing and OLAP
CRUCIBLE: towards unified secure on- and off-line analytics at scale
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Efficient bulk updates on multiversion B-trees
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
We describe DataDepot, a tool for generating warehouses from streaming data feeds, such as network-traffic traces, router alerts, financial tickers, transaction logs, and so on. DataDepot is a streaming data warehouse designed to automate the ingestion of streaming data from a wide variety of sources and to maintain complex materialized views over these sources. As a streaming warehouse, DataDepot is similar to Data Stream Management Systems (DSMSs) with its emphasis on temporal data, best-effort consistency, and real-time response. However, as a data warehouse, DataDepot is designed to store tens to hundreds of terabytes of historical data, allow time windows measured in years or decades, and allow both real-time queries on recent data and deep analyses on historical data. In this paper we discuss the DataDepot architecture, with an emphasis on several of its novel and critical features. DataDepot is currently being used for five very large warehousing projects within AT&T; one of these warehouses ingests 500 Mbytes per minute (and is growing). We use these installations to illustrate streaming warehouse use and behavior, and design choices made in developing DataDepot. We conclude with a discussion of DataDepot applications and the efficacy of some optimizations.