SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Graph-based synopses for relational selectivity estimation
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
QoX-driven ETL design: reducing the cost of ETL consulting engagements
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Optimizing analytic data flows for multiple execution engines
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Hi-index | 0.00 |
As enterprises become more automated, real-time, and data-driven, they need to integrate new data sources and specialized processing engines. The traditional business intelligence architecture of Extract-Transform-Load (ETL) flows, followed by querying, reporting, and analytic operations, is being generalized to analytic data flows that utilize a variety of data types and operations. These complicated flows are difficult to design, implement and maintain since they span a variety of systems. Additionally, new design requirements may be imposed such as design for fault-tolerance, freshness, maintainability, sampling, etc. To reduce development time and maintenance costs, automation is needed. We present xPAD, our platform to manage analytic data flows. xPAD enables flow design. We show how these designs can be optimized, not just for performance, but for other objectives as well. xPAD is engine-agnostic. We show how it can generate executable code for a number of execution engines. It can also import existing flows from other engines and optimize those flows. In that way, it can transform a flow written for one engine into an optimized flow for a different engine. In our demonstration, we will also use various example flows to show optimization for different objectives and comparison of flow execution on different engines.