Analysis of web search engine query session and clicked documents

  • Authors:
  • David Nettleton;Liliana Calderón-Benavides;Ricardo Baeza-Yates

  • Affiliations:
  • University Pompeu Fabra, Barcelona, Spain;University Pompeu Fabra, Barcelona, Spain and University Autónoma of Bucaramanga, Bucaramanga, Colombia;University Pompeu Fabra, Barcelona, Spain and Yahoo! Research Barcelona, Barcelona, Spain

  • Venue:
  • WebKDD'06 Proceedings of the 8th Knowledge discovery on the web international conference on Advances in web mining and web usage analysis
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The identification of a user's intention or interest by the analysis of the queries submitted to a search engine and the documents selected as answers to these queries, can be very useful to offer more adequate results for that user. In this Chapter we present the analysis of a Web search engine query log from two different perspectives: the query session and the clicked document. In the first perspective, that of the query session, we process and analyze web search engine query and click data for the query session (query + clicked results) conducted by the user. We initially state some hypotheses for possible user types and quality profiles for the user session, based on descriptive variables of the session. In the second perspective, that of the clicked document, we repeat the process from the perspective of the documents (URL's) selected. We also initially define possible document categories and select descriptive variables to define the documents. We apply a systematic data mining process to click data, contrasting non- supervised (Kohonen) and supervised (C4.5) methods to cluster and model the data, in order to identify profiles and rules which relate to theoretical user behavior and user session "quality", from the point of view of user session, and to identify document profiles which relate to theoretical user behavior, and document (URL) organization, from the document perspective.