Principles of distributed database systems (2nd ed.)
Principles of distributed database systems (2nd ed.)
Query optimization in the presence of limited access patterns
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining source coverage statistics for data integration
Proceedings of the 3rd international workshop on Web information and data management
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Quality-driven Integration of Heterogenous Information Systems
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
A Scalable Algorithm for Answering Queries Using Views
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Querying Heterogeneous Information Sources Using Source Descriptions
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Using Probabilistic Information in Data Integration
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Optimizing Recursive Information-Gathering Plans
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Learning response time for WebSources using query feedback and application in query optimization
The VLDB Journal — The International Journal on Very Large Data Bases
Planning to gather inforrnation
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Mining source coverage statistics for data integration
Proceedings of the 3rd international workshop on Web information and data management
Mining coverage statistics for websource selection in a mediator
Proceedings of the eleventh international conference on Information and knowledge management
Web data retrieval and extraction
Data & Knowledge Engineering - Special issue: Data integration over the Web
Optimizing Recursive Information Gathering Plans in EMERAC
Journal of Intelligent Information Systems
A Frequency-based Approach for Mining Coverage Statistics in Data Integration
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Multi-Criteria Query Optimization in the Presence of Result Size and Quality Tradeoffs
Multimedia Tools and Applications
Integration of biological sources: current systems and challenges ahead
ACM SIGMOD Record
Effectively Mining and Using Coverage and Overlap Statistics for Data Integration
IEEE Transactions on Knowledge and Data Engineering
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Multiple-Objective Compression of Data Cubes in Cooperative OLAP Environments
ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
Journal of Intelligent Information Systems
ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
A framework for data quality aware query systems
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Hi-index | 0.00 |
Existing approaches for optimizing queries in data integration use decoupled strategies--attempting to optimize coverage and cost in two separate phases. Since sources tend to have a variety of access limitations, such phased optimization of cost and coverage can unfortunately lead to expensive planning as well as highly inefficient plans. In this paper we present techniques for joint optimization of cost and coverage of the query plans. Our algorithms search in the space of parallel query plans that support multiple sources for each subgoal conjunct. The refinement of the partial plans takes into account the potential parallelism between source calls, and the binding compatibilities between the sources included in the plan. We start by introducing and motivating our query plan representation. We then briefly review how to compute the cost and coverage of a parallel plan. Next, we provide both a System-R style query optimization algorithm as well as a greedy local search algorithm for searching in the space of such query plans. Finally we present a simulation study that demonstrates that the plans generated by our approach will be significantly better, both in terms of planning cost, and in terms of plan execution cost, compared to the existing approaches.