Gigascope: high performance network monitoring with an SQL interface
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
StreaMon: an adaptive engine for stream query processing
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Optimizing ETL Processes in Data Warehouses
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Design, implementation, and evaluation of the linear road bnchmark on the stream processing core
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
The CQL continuous query language: semantic foundations and query execution
The VLDB Journal — The International Journal on Very Large Data Bases
Monitoring streams: a new class of data management applications
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
SPC: a distributed, scalable platform for data mining
Proceedings of the 4th international workshop on Data mining standards, services and platforms
Deciding the physical implementation of ETL workflows
Proceedings of the ACM tenth international workshop on Data warehousing and OLAP
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
SPADE: the system s declarative stream processing engine
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SODA: an optimizing scheduler for large-scale stream-based distributed computer systems
Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
Efficient Construction of Compact Shedding Filters for Data Stream Processing
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Principles for an ETL Benchmark
Performance Evaluation and Benchmarking
Hirundo: a mechanism for automated production of optimized data stream graphs
ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
Understanding and improving the cost of scaling distributed event processing
Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
A performance analysis of system s, s4, and esper via two level benchmarking
QEST'13 Proceedings of the 10th international conference on Quantitative Evaluation of Systems
Automatic optimization of stream programs via source program operator graph transformations
Distributed and Parallel Databases
Hi-index | 0.00 |
ETL (Extract-Transform-Load) processing is filling an increasingly critical role in analyzing business data and in taking appropriate business actions based on the results. As the volume of business data to be analyzed increases and quick responses are more critical for business success, there are strong demands for scalable high-performance ETL processors. In this paper, we evaluate a distributed data stream processing engine called System S for those purposes. Based on the original motivation of building System S as a data stream processing engine, we first perform a qualitative study to see if the programming model of System S is suitable for representing an ETL workflow. Second we did performance studies with a representative ETL scenario. Through our series of experiments, we found that the SPADE programming model and its runtime environment naturally fits the requirements of handling massive amounts of ETL data in a highly scalable manner.