Optimizing Recursive Information Gathering Plans in EMERAC

Authors:
Subbarao Kambhampati;Eric Lambrecht;Ullas Nambiar;Zaiqing Nie;Gnanaprakasam Senthil
Affiliations:
Department of Computer Science and Engineering, Arizona State University, Tempe, AZ 85287, USA. rao@asu.edu;Department of Computer Science and Engineering, Arizona State University, Tempe, AZ 85287, USA;Department of Computer Science and Engineering, Arizona State University, Tempe, AZ 85287, USA;Department of Computer Science and Engineering, Arizona State University, Tempe, AZ 85287, USA;Department of Computer Science and Engineering, Arizona State University, Tempe, AZ 85287, USA
Venue:
Journal of Intelligent Information Systems
Year:
2004

Citing 33
Cited 7

Extensible query processing in starburst

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Fundamentals of database systems (2nd ed.)

Fundamentals of database systems (2nd ed.)
Query caching and optimization in distributed mediator systems

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Sound and efficient closed-world reasoning for planning

Artificial Intelligence
The TSIMMIS Approach to Mediation: Data Models and Languages

Journal of Intelligent Information Systems - Special issue: next generation information technologies and systems
Answering recursive queries using views

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
An overview of query optimization in relational systems

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Complexity of answering queries using materialized views

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Cost-based query scrambling for initial delays

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Database techniques for the World-Wide Web: a survey

ACM SIGMOD Record
Query optimization in the presence of limited access patterns

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
An algorithm for ordering subgoals in NAIL?

Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Building regression cost models for multidatabase systems

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Joint optimization of cost and coverage of query plans in data integration

Proceedings of the tenth international conference on Information and knowledge management
Mining source coverage statistics for data integration

Proceedings of the 3rd international workshop on Web information and data management
Principles of Database and Knowledge-Base Systems: Volume II: The New Technologies

Principles of Database and Knowledge-Base Systems: Volume II: The New Technologies
Mining coverage statistics for websource selection in a mediator

Proceedings of the eleventh international conference on Information and knowledge management
Scaling Access to Heterogeneous Data Sources with DISCO

IEEE Transactions on Knowledge and Data Engineering
Optimizing Queries with Materialized Views

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Query Folding

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Optimizing Queries Across Diverse Data Sources

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Query Optimization in the Presence of Foreign Functions

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Obtaining Complete Answers from Incomplete Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Querying Heterogeneous Information Sources Using Source Descriptions

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Using Probabilistic Information in Data Integration

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Describing and Using Query Capabilities of Heterogeneous Sources

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Optimizing Recursive Information-Gathering Plans

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Learning response time for WebSources using query feedback and application in query optimization

The VLDB Journal — The International Journal on Very Large Data Bases
Answering queries using views: A survey

The VLDB Journal — The International Journal on Very Large Data Bases
Capability-Sensitive Query Processing on Internet Sources

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Query planning and optimization in information integration

Query planning and optimization in information integration
Query optimization using local completeness

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Planning to gather inforrnation

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Composing, optimizing, and executing plans for bioinformatics web services

The VLDB Journal — The International Journal on Very Large Data Bases
The GEON portal: accelerating knowledge discovery in the geosciences

WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Mobile join operators for restricted sources

Mobile Information Systems
Quality-driven geospatial data integration

Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems
Active knowledge: dynamically enriching RDF knowledge bases by web services

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Implementation of federated databases through updatable views

EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
SMARTINT: using mined attribute dependencies to integrate fragmented web databases

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe two optimization techniques that are specially tailored for information gathering. The first is a greedy minimization algorithm that minimizes an information gathering plan by removing redundant and overlapping information sources without loss of completeness. We then discuss a set of heuristics that guide the greedy minimization algorithm so as to remove costlier information sources first. In contrast to previous work, our approach can handle recursive query plans that arise commonly in the presence of constrained sources. Second, we present a method for ordering the access to sources to reduce the execution cost. This problem differs significantly from the traditional database query optimization problem as sources on the Internet have a variety of access limitations and the execution cost in information gathering is affected both by network traffic and by the connection setup costs. Furthermore, because of the autonomous and decentralized nature of the Web, very little cost statistics about the sources may be available. In this paper, we propose a heuristic algorithm for ordering source calls that takes these constraints into account. Specifically, our algorithm takes both access costs and traffic costs into account, and is able to operate with very coarse statistics about sources (i.e., without depending on full source statistics). Finally, we will discuss implementation and empirical evaluation of these methods in Emerac, our prototype information gathering system.