(big) usage data in web search

  • Authors:
  • Ricardo Baeza-Yates;Yoelle Maarek

  • Affiliations:
  • Yahoo! Labs, Barcelona, CA, USA;Yahoo! Labs, Haifa, Israel

  • Venue:
  • Proceedings of the sixth ACM international conference on Web search and data mining
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web Search, which takes its root in the mature field of information retrieval, evolved tremendously over the last 15 years. The field encountered its first revolution when it started to deal with huge amounts of Web pages. Then, a major step was accomplished when engines started to consider the structure of the Web graph and leveraged link analysis in both crawling and ranking. Finally, a more discrete, but no less critical step, was made when search engines started to monitor and exploit the numerous (mostly implicit) signals provided by users while interacting with the search engine. In this tutorial we focus on this "revolution" of large scale usage data. In the first part of this tutorial, we focus on usage data, which typically refers to any type of information provided by the user while interacting with the search engine. It comes first under its raw form as a set of individual signals, but is typically mined after multiple signals have been aggregated and linked to the same interaction event. The two major types of such data are (1) query streams, which include the query string that the user issued, together with the time-stamp of the query, a user identifier, possibly the IP of the machine on which the browser runs, and (2) click data, which include the reference to the element the user clicked on the page together with the timestamp, user identifier, possibly IP, the rank of the link if it is a result, etc. Exploiting usage data under its multiple forms brought an unprecedented wealth of implicit information to Web Search. We discuss in the second part of this tutorial some of the key Web search applications that it made possible. One such example is the query spelling correction feature embodied now in all search engines. In fact, after years of very sophisticated spell checking research, simply counting similar queries at a small edit distance would in most cases surface the most popular spelling as the correct one, a beautiful and simple demonstration of the wisdom of crowds principle.