Recent and robust query auto-completion

  • Authors:
  • Stewart Whiting;Joemon M. Jose

  • Affiliations:
  • University of Glasgow, Glasgow, United Kingdom;University of Glasgow, Glasgow, United Kingdom

  • Venue:
  • Proceedings of the 23rd international conference on World wide web
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Query auto-completion (QAC) is a common interactive feature that assists users in formulating queries by providing completion suggestions as they type. In order for QAC to minimise the user's cognitive and physical effort, it must: (i) suggest the user's intended query after minimal input keystrokes, and (ii) rank the user's intended query highly in completion suggestions. Typically, QAC approaches rank completion suggestions by their past popularity. Accordingly, QAC is usually very effective for previously seen and consistently popular queries. Users are increasingly turning to search engines to find out about unpredictable emerging and ongoing events and phenomena, often using previously unseen or unpopular queries. Consequently, QAC must be both robust and time-sensitive -- that is, able to sufficiently rank both consistently and recently popular queries in completion suggestions. To address this trade-off, we propose several practical completion suggestion ranking approaches, including: (i) a sliding window of query popularity evidence from the past 2-28 days, (ii) the query popularity distribution in the last N queries observed with a given prefix, and (iii) short-range query popularity prediction based on recently observed trends. Using real-time simulation experiments, we extensively investigated the parameters necessary to maximise QAC effectiveness for three openly available query log datasets with prefixes of 2-5 characters: MSN and AOL (both English), and Sogou 2008 (Chinese). Optimal parameters vary for each query log, capturing the differing temporal dynamics and querying distributions. Results demonstrate consistent and language-independent improvements of up to 9.2% over a non-temporal QAC baseline for all query logs with prefix lengths of 2-3 characters. This work is an important step towards more effective QAC approaches.