Optimal Information Gathering on the Internet with Time and Cost Constraints

  • Authors:
  • Oren Etzioni;Steve Hanks;Tao Jiang;Omid Madani

  • Affiliations:
  • -;-;-;-

  • Venue:
  • SIAM Journal on Computing
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

The World Wide Web provides access to vast amounts of information, but content providers are considering charging for the information and services they supply. Thus the consumer may face the problem of balancing the benefit of asking for information against the cost (in terms of both money and time) of acquiring it. We study information-gathering strategies that maximize the expected value to the consumer. In our model there is a single information request, which has a known benefit to the consumer. To satisfy the request, queries can be sent simultaneously or in sequence to any of a finite set of independent information sources. For each source we know the monetary cost of making the query, the amount of time it will take, and the probability that the source will be able to provide the requested information. A policy specifies which sources to contact at which times, and the expected value of the policy can be defined as some function of the likelihood that the policy will yield an answer, the expected benefit, and the monetary cost and time delay associated with executing the policy. The problem is to find an expected-value-maximizing policy.We explore four variants of the objective function V: (i) V consists only of the benefit term subject to threshold constraints on both total cost and total elapsed time, (ii) V is linear in the expected total cost of the policy subject to the constraint that the total elapsed time never exceeds some deadline, (iii) V is linear in the expected total elapsed time subject to the constraint that the total cost never exceeds some threshold, and (iv) V is linear in the expected total monetary cost and the expected time delay of the policy. The problems of devising an optimal querying policy for all four variants and approximating an optimal querying policy for variants (iii) and (iv) are shown to be NP-hard. For (i), and with a mild simplifying assumption for (iii), we give a fully polynomial time approximation scheme. For (ii), we consider batched querying policies, and design an O(n2) time approximation algorithm with ratio $\frac{1}{2}$ and a polynomial time approximation scheme for optimal single-batch policies, and an O(kn2) time approximation algorithm with ratio $\frac{1}{5}$ for optimal k-batch policies.