Multidimensional search result diversification: diverse search results for diverse users

  • Authors:
  • Sumit Bhatia

  • Affiliations:
  • Pennsylvania State University, University Park, USA

  • Venue:
  • Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Hundreds of millions of people today rely on Web based Search Engines to satisfy their information needs. In order to meet the expectations of this vast and diverse user population, the search engine should present a list of results such that the probability of satisfying the average user is maximized. This leads us to the problem of Search Result Diversification. Given a user submitted query, the search engine should include results that are relevant to the user query and at the same time, diverse enough to meet the expectations of diverse user populations. However, it is not clear in what respect the results should be diversified. Much of the current work in diversity focuses on ambiguous and underspecified queries and tries to include results corresponding to diverse interpretations of the ambiguous query. This is not always sufficient. My analysis of a commercial web search engine's logs reveals that even for well-specified informational queries, click entropy is very high indicating that different users prefer different types of documents. Very recently, a diversification algorithm fine-tuned for such informational queries has been proposed. Further, high click entropies were also observed for a large fraction of transactional queries. One major goal of my PhD thesis will then be to identify the various possible dimensions along which the search results can be diversified. Having such an information will enhance our understanding about the expectations of an average user from the search engine. By utilizing aggregate statistics about queries, users and their interaction with the search engine for different queries, more concrete evidences about diverse user preferences as well as relative importance of different diversity dimensions can be derived. Once we know different diversity dimensions, the next natural question is: given a query, how can we determine the diversification requirement best suited for the query? For some queries sub-topic coverage may be more important while for others diversification with respect to document source or stylistics might be important. This problem is related to the problem of selective diversification where the goal is to identify queries for which diversification techniques should be used. However, in addition, we are also interested in identifying different diversity classes a given query belongs to. Further, for some queries it may be required to diversify along multiple diversity dimensions. In such cases, it is also important to determine the relative importance of different diversity dimensions for the given query. By utilizing past user interaction data, query level features (like query clarity, entropy, lexical features etc.) and document level features (e.g. popularity, content quality, previous click history etc.), classifiers for diversification requirements can be developed. Given a user query, once we know the type of diversity requirements for the user, an appropriate diversification technique is required. I would like to study the problem of simultaneously diversifying search results along multiple dimensions, as discussed above. One possible way here could be to build upon the nugget based framework introduced by Clarke et al. where we represent each document as a set of nuggets, each nugget corresponding to a diversity dimension.