A utility theoretic approach to determining optimal wait times in distributed information retrieval

  • Authors:
  • Kartik Hosanagar

  • Affiliations:
  • The Wharton School at The University of Pennsylvania, Philadelphia, PA

  • Venue:
  • Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Distributed IR systems query a large number of IR servers, merge the retrieved results and display them to users. Since different servers handle collections of different sizes, have different processing and bandwidth capacities, there can be considerable heterogeneity in their response times. The broker in the distributed IR system thus has to make decisions regarding terminating searches based on perceived value of waiting -- retrieving more documents -- and the costs imposed on users by waiting for more responses. In this paper, we apply utility theory to formulate the broker's decision problem. The problem is a stochastic nonlinear program. We use Monte Carlo simulations to demonstrate how the optimal wait time may be determined in the context of a comparison shopping engine that queries multiple store websites for price and product information. We use data gathered from 30 stores for a set of 60 books. Our research demonstrates how a broker can leverage information about past retrievals regarding distributions of server response time and relevance scores to optimize its performance. Our main contribution is the formulation of the decision model for optimal wait time and proposal of a solution method. Our results suggest that the optimal wait time is highly sensitive to the manner in which users value from a set of retrieved results differs from the sum of user value from each result evaluated independently. We also find that the optimal wait time increases with the size of the distributed collections, but only if user utility from a set of results is nearly equal to the sum of utilities from each result.