Paraphrasing invariance coefficient: measuring para-query invariance of search engines

Authors:
Tomasz Imielinski;Jinyun Yan;Yihan Fang;Kurt Eldridge;Huiwen Yu;Peter Kelly
Affiliations:
Ask.com;Rutgers University;Ask.com;Ask.com;Ask.com;Ask.com
Venue:
Proceedings of the 3rd International Semantic Search Workshop
Year:
2010

Citing 10
Cited 0

DIRT @SBT@discovery of inference rules from text

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Labeling images with a computer game

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Learning to paraphrase: an unsupervised approach using multiple-sequence alignment

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Games with a Purpose

Computer
Paraphrasing for automatic evaluation

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Automatic paraphrase acquisition from news articles

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Learning consensus opinion: mining data from a labeling game

Proceedings of the 18th international conference on World wide web
LexPar: A Freely Available English Paraphrase Lexicon Automatically Extracted from FrameNet

ICSC '09 Proceedings of the 2009 IEEE International Conference on Semantic Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Paraphrasing is the restatement (or reuse) of text which preserves its meaning in another form. A para-query is a para-phrase of a search query. Humans easily recognize para-queries, but search engines are still far away from it. We claim that in order for a search engine to be called semantic it is necessary that it recognizes para-queries by returning the same search results for all para-queries of a given query. Recognizing para-queries is an important and desired ability of a search engine. It can relieve users of the burden of rephrasing queries in order to improve the relevance of results. In this paper, we cover two main threads: monolingual para-query generation (PG) and para-query recognition measurement (PRM). Para-query generation aims to automatically generate as many English para-queries as possible for a given query. We propose a novel game "Rephraser" to tackle this problem. Hundreds of para-query templates are extracted from the game's output and used to compose tens of thousands of para-queries. The goal of para-query recognition measurement is to examine to what level search engines recognize para-queries. We propose the concept of paraphrasing invariance coefficient (PIC) which is defined as the probability that search results are the same for a pair of para-queries. By using para-queries generated from the game, we design experiments to measure search engines' PIC. Results show that today's leading search engines are still inferior to human ability in recognizing para-queries. It is a long way ahead for search to be truly semantic.