SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
ACM Transactions on Database Systems (TODS)
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Optimizing Queries Across Diverse Data Sources
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Query Optimization in a Heterogeneous DBMS
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Processing Queries Over Generalization Hierarchies in a Multidatabase System
VLDB '83 Proceedings of the 9th International Conference on Very Large Data Bases
Optimizing ETL Processes in Data Warehouses
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
An approach to optimize data processing in business processes
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Parallelizing query optimization
Proceedings of the VLDB Endowment
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
Data integration flows for business intelligence
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
QoX-driven ETL design: reducing the cost of ETL consulting engagements
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Performance Evaluation and Benchmarking
Nephele/PACTs: a programming model and execution framework for web-scale analytical processing
Proceedings of the 1st ACM symposium on Cloud computing
Runtime measurements in the cloud: observing, analyzing, and reducing variance
Proceedings of the VLDB Endowment
The performance of MapReduce: an in-depth study
Proceedings of the VLDB Endowment
Leveraging business process models for ETL design
ER'10 Proceedings of the 29th international conference on Conceptual modeling
CIEL: a universal execution engine for distributed data-flow computing
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Analytics-as-a-service: confluence of big data, cloud computing and software-as-a-service
CASCON '13 Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research
Hi-index | 0.00 |
This paper addresses the challenge of optimizing analytic data flows for modern business intelligence (BI) applications. We first describe the changing nature of BI in today's enterprises as it has evolved from batch-based processes, in which the back-end extraction-transform-load (ETL) stage was separate from the front-end query and analytics stages, to near real-time data flows that fuse the back-end and front-end stages. We describe industry trends that force new BI architectures, e.g., mobile and cloud computing, semi-structured content, event and content streams as well as different execution engine architectures. For execution engines, the consequence of "one size does not fit all" is that BI queries and analytic applications now require complicated information flows as data is moved among data engines and queries span systems. In addition, new quality of service objectives are desired that incorporate measures beyond performance such as freshness (latency), reliability, accuracy, and so on. Existing approaches that optimize data flows simply for performance on a single system or a homogeneous cluster are insufficient. This paper describes our research to address the challenge of optimizing this new type of flow. We leverage concepts from earlier work in federated databases, but we face a much larger search space due to new objectives and a larger set of operators. We describe our initial optimizer that supports multiple objectives over a single processing engine. We then describe our research in optimizing flows for multiple engines and objectives and the challenges that remain.