Learning response time for WebSources using query feedback and application in query optimization

Authors:
Jean-Robert Gruser;Louiqa Raschid;Vladimir Zadorozhny;Tao Zhan
Affiliations:
Netforce, Levallois-Perret, France, E-mail: gruser@netforce.fr;University of Maryland, College Park, MD 20742, USA/ E-mail: {louiqa,vladimir,taozhan}@umiacs.umd.edu;University of Maryland, College Park, MD 20742, USA/ E-mail: {louiqa,vladimir,taozhan}@umiacs.umd.edu;University of Maryland, College Park, MD 20742, USA/ E-mail: {louiqa,vladimir,taozhan}@umiacs.umd.edu
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2000

Citing 20
Cited 20

Randomized algorithms for optimizing large join queries

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Introduction to the theory of neural computation

Introduction to the theory of neural computation
Join processing in relational databases

ACM Computing Surveys (CSUR)
Mediators in the Architecture of Future Information Systems

Computer
IRO-DB: a distributed system federating object and relational databases

Object-oriented multidatabase systems
Query caching and optimization in distributed mediator systems

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Cost-based query scrambling for initial delays

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Internet service performance failure detection

ACM SIGMETRICS Performance Evaluation Review
Selection algorithms for replicated Web servers

ACM SIGMETRICS Performance Evaluation Review
Capabilities-based query rewriting in mediator systems

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Scrambling query plans to cope with unexpected delays

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Optimizing Queries Across Diverse Data Sources

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The Case for Enhanced Abstract Data Types

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Query Optimization in a Heterogeneous DBMS

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Querying Heterogeneous Information Sources Using Source Descriptions

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Leveraging Mediator Cost Models with Heterogeneous Data Sources

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Forecasting network performance to support dynamic scheduling using the network weather service

HPDC '97 Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing
Scaling heterogeneous databases and the design of Disco

ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Wide-area Internet traffic patterns and characteristics

IEEE Network: The Magazine of Global Internetworking

Joint optimization of cost and coverage of query plans in data integration

Proceedings of the tenth international conference on Information and knowledge management
Mining source coverage statistics for data integration

Proceedings of the 3rd international workshop on Web information and data management
Efficient evaluation of queries in a mediator for WebSources

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Profile-Based Data Delivery for Web Applications

EDBT '02 Proceedings of the Worshops XMLDM, MDDE, and YRWS on XML-Based Data Management and Multimedia Engineering-Revised Papers
Validating an Access Cost Model for Wide Area Applications

CooplS '01 Proceedings of the 9th International Conference on Cooperative Information Systems
Quality of service in an information economy

ACM Transactions on Internet Technology (TOIT)
Optimizing Recursive Information Gathering Plans in EMERAC

Journal of Intelligent Information Systems
A Frequency-based Approach for Mining Coverage Statistics in Data Integration

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Effectively Mining and Using Coverage and Overlap Statistics for Data Integration

IEEE Transactions on Knowledge and Data Engineering
A simulation-based approach for dynamic process management at web service platforms

Computers and Industrial Engineering
Query cost estimation through remote system contention states analysis over the Internet

Web Intelligence and Agent Systems
Using latency-recency profiles for data delivery on the web

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Self-* through self-learning: Overload control for distributed web systems

Computer Networks: The International Journal of Computer and Telecommunications Networking
A simulation-based approach for dynamic process management at web service platforms

Computers and Industrial Engineering
Quality-driven query answering for integrated information systems

Quality-driven query answering for integrated information systems
Answering complex structured queries over the deep web

Proceedings of the 15th Symposium on International Database Engineering & Applications
A cooperative model for wide area content delivery applications

OTM'05 Proceedings of the 2005 Confederated international conference on On the Move to Meaningful Internet Systems - Volume >Part I
Query planning in the presence of overlapping sources

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
KNN based evolutionary techniques for updating query cost models

FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
The QOL approach for optimizing distributed queries without complete knowledge

Proceedings of the 16th International Database Engineering & Applications Sysmposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

The rapid growth of the Internet and support for interoperability protocols has increased the number of Web accessible sources, WebSources. Current wrapper mediator architectures need to be extended with a wrapper cost model (WCM) for WebSources that can estimate the response time (delays) to access sources as well as other relevant statistics. In this paper, we present a Web prediction tool (WebPT), a tool that is based on learning using query feedback from WebSources. The WebPT uses dimensions time of day, day, and quantity of data, to learn response times from a particular WebSource, and to predict the expected response time (delay) for some query. Experiment data was collected from several sources, and those dimensions that were significant in estimating the response time were determined. We then trained the WebPT on the collected data, to use the three dimensions mentioned above, and to predict the response time, as well as a confidence in the prediction. We describe the WebPT learning algorithms, and report on the WebPT learning for WebSources. Our research shows that we can improve the quality of learning by tuning the WebPT features, e.g., training the WebPT using a logarithm of the input training data; including significant dimensions in the WebPT; or changing the ordering of dimensions. A comparison of the WebPT with more traditional neural network (NN) learning has been performed, and we briefly report on the comparison. We then demonstrate how the WebPT prediction of delay may be used by a scrambling enabled optimizer. A scrambling algorithm identifies some critical points of delay, where it makes a decision to scramble (modify) a plan, to attempt to hide the expected delay by computing some other part of the plan that is unaffected by the delay. We explore the space of real delay at a WebSource, versus the WebPT prediction of this delay, with respect to critical points of delay in specific plans. We identify those cases where WebPT overestimation or underestimation of the real delay results in a penalty in the scrambling enabled optimizer, and those cases where there is no penalty. Using the experimental data and WebPT learning, we test how good the WebPT is in minimizing these penalties.