Extensible query processing in starburst
SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Fundamentals of database systems (2nd ed.)
Fundamentals of database systems (2nd ed.)
Query caching and optimization in distributed mediator systems
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Sound and efficient closed-world reasoning for planning
Artificial Intelligence
The TSIMMIS Approach to Mediation: Data Models and Languages
Journal of Intelligent Information Systems - Special issue: next generation information technologies and systems
Answering recursive queries using views
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
An overview of query optimization in relational systems
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Complexity of answering queries using materialized views
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Cost-based query scrambling for initial delays
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Database techniques for the World-Wide Web: a survey
ACM SIGMOD Record
Query optimization in the presence of limited access patterns
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
An algorithm for ordering subgoals in NAIL?
Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Building regression cost models for multidatabase systems
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Joint optimization of cost and coverage of query plans in data integration
Proceedings of the tenth international conference on Information and knowledge management
Mining source coverage statistics for data integration
Proceedings of the 3rd international workshop on Web information and data management
Principles of Database and Knowledge-Base Systems: Volume II: The New Technologies
Principles of Database and Knowledge-Base Systems: Volume II: The New Technologies
Mining coverage statistics for websource selection in a mediator
Proceedings of the eleventh international conference on Information and knowledge management
Scaling Access to Heterogeneous Data Sources with DISCO
IEEE Transactions on Knowledge and Data Engineering
Optimizing Queries with Materialized Views
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Optimizing Queries Across Diverse Data Sources
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Query Optimization in the Presence of Foreign Functions
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Obtaining Complete Answers from Incomplete Databases
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Querying Heterogeneous Information Sources Using Source Descriptions
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Using Probabilistic Information in Data Integration
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Describing and Using Query Capabilities of Heterogeneous Sources
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Optimizing Recursive Information-Gathering Plans
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Learning response time for WebSources using query feedback and application in query optimization
The VLDB Journal — The International Journal on Very Large Data Bases
Answering queries using views: A survey
The VLDB Journal — The International Journal on Very Large Data Bases
Capability-Sensitive Query Processing on Internet Sources
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Query planning and optimization in information integration
Query planning and optimization in information integration
Query optimization using local completeness
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Planning to gather inforrnation
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Composing, optimizing, and executing plans for bioinformatics web services
The VLDB Journal — The International Journal on Very Large Data Bases
The GEON portal: accelerating knowledge discovery in the geosciences
WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Mobile join operators for restricted sources
Mobile Information Systems
Quality-driven geospatial data integration
Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems
Active knowledge: dynamically enriching RDF knowledge bases by web services
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Implementation of federated databases through updatable views
EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
SMARTINT: using mined attribute dependencies to integrate fragmented web databases
Journal of Intelligent Information Systems
Hi-index | 0.00 |
In this paper we describe two optimization techniques that are specially tailored for information gathering. The first is a greedy minimization algorithm that minimizes an information gathering plan by removing redundant and overlapping information sources without loss of completeness. We then discuss a set of heuristics that guide the greedy minimization algorithm so as to remove costlier information sources first. In contrast to previous work, our approach can handle recursive query plans that arise commonly in the presence of constrained sources. Second, we present a method for ordering the access to sources to reduce the execution cost. This problem differs significantly from the traditional database query optimization problem as sources on the Internet have a variety of access limitations and the execution cost in information gathering is affected both by network traffic and by the connection setup costs. Furthermore, because of the autonomous and decentralized nature of the Web, very little cost statistics about the sources may be available. In this paper, we propose a heuristic algorithm for ordering source calls that takes these constraints into account. Specifically, our algorithm takes both access costs and traffic costs into account, and is able to operate with very coarse statistics about sources (i.e., without depending on full source statistics). Finally, we will discuss implementation and empirical evaluation of these methods in Emerac, our prototype information gathering system.