Finding Related Search Engine Queries by Web Community Based Query Enrichment

  • Authors:
  • Lin Li;Shingo Otsuka;Masaru Kitsuregawa

  • Affiliations:
  • Department of Information and Communication Engineering, The University of Tokyo, Tokyo, Japan and School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China;National Institute for Materials Science, Tsukuba, Japan;Institute of Industrial Science, The University of Tokyo, Tokyo, Japan

  • Venue:
  • World Wide Web
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The conventional approaches of finding related search engine queries rely on the common terms shared by two queries to measure their relatedness. However, search engine queries are usually short and the term overlap between two queries is very small. Using query terms as a feature space cannot accurately estimate relatedness. Alternative feature spaces are needed to enrich the term based search queries. In this paper, given a search query, first we extract the Web pages accessed by users from Japanese Web access logs which store the users individual and collective behavior. From these accessed Web pages we usually can get two kinds of feature spaces, i.e, content-sensitive (e.g., nouns) and content-ignorant (e.g., URLs), to enrich the expressions of search queries. Then, the relatedness between search queries can be estimated on their enriched expressions. Our experimental results show that the URL feature space produces much lower precision scores than the noun feature space which, however, is not applicable in non-text pages, dynamic pages and so on. It is crucial to improve the quality of the URL (content-ignorant) feature space since it is generally available in all types of Web pages. We propose a novel content-ignorant feature space, called Web community which is created from a Japanese Web page archive by exploiting link analysis. Experimental results show that the proposed Web community feature space generates much better results than the URL feature space.