Mining search and browse logs for web search: A Survey

  • Authors:
  • Daxin Jiang;Jian Pei;Hang Li

  • Affiliations:
  • Microsoft Corporation;Simon Fraser University;Huawei Technologies

  • Venue:
  • ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Huge amounts of search log data have been accumulated at Web search engines. Currently, a popular Web search engine may receive billions of queries and collect terabytes of records about user search behavior daily. Beside search log data, huge amounts of browse log data have also been collected through client-side browser plugins. Such massive amounts of search and browse log data provide great opportunities for mining the wisdom of crowds and improving Web search. At the same time, designing effective and efficient methods to clean, process, and model log data also presents great challenges. In this survey, we focus on mining search and browse log data for Web search. We start with an introduction to search and browse log data and an overview of frequently-used data summarizations in log mining. We then elaborate how log mining applications enhance the five major components of a search engine, namely, query understanding, document understanding, document ranking, user understanding, and monitoring and feedback. For each aspect, we survey the major tasks, fundamental principles, and state-of-the-art methods.