Paraphrasing invariance coefficient: measuring para-query invariance of search engines

  • Authors:
  • Tomasz Imielinski;Jinyun Yan;Yihan Fang;Kurt Eldridge;Huiwen Yu;Peter Kelly

  • Affiliations:
  • Ask.com;Rutgers University;Ask.com;Ask.com;Ask.com;Ask.com

  • Venue:
  • Proceedings of the 3rd International Semantic Search Workshop
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Paraphrasing is the restatement (or reuse) of text which preserves its meaning in another form. A para-query is a para-phrase of a search query. Humans easily recognize para-queries, but search engines are still far away from it. We claim that in order for a search engine to be called semantic it is necessary that it recognizes para-queries by returning the same search results for all para-queries of a given query. Recognizing para-queries is an important and desired ability of a search engine. It can relieve users of the burden of rephrasing queries in order to improve the relevance of results. In this paper, we cover two main threads: monolingual para-query generation (PG) and para-query recognition measurement (PRM). Para-query generation aims to automatically generate as many English para-queries as possible for a given query. We propose a novel game "Rephraser" to tackle this problem. Hundreds of para-query templates are extracted from the game's output and used to compose tens of thousands of para-queries. The goal of para-query recognition measurement is to examine to what level search engines recognize para-queries. We propose the concept of paraphrasing invariance coefficient (PIC) which is defined as the probability that search results are the same for a pair of para-queries. By using para-queries generated from the game, we design experiments to measure search engines' PIC. Results show that today's leading search engines are still inferior to human ability in recognizing para-queries. It is a long way ahead for search to be truly semantic.