DIRT @SBT@discovery of inference rules from text
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Labeling images with a computer game
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Learning to paraphrase: an unsupervised approach using multiple-sequence alignment
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Computer
Paraphrasing for automatic evaluation
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Automatic paraphrase acquisition from news articles
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Learning consensus opinion: mining data from a labeling game
Proceedings of the 18th international conference on World wide web
LexPar: A Freely Available English Paraphrase Lexicon Automatically Extracted from FrameNet
ICSC '09 Proceedings of the 2009 IEEE International Conference on Semantic Computing
Hi-index | 0.00 |
Paraphrasing is the restatement (or reuse) of text which preserves its meaning in another form. A para-query is a para-phrase of a search query. Humans easily recognize para-queries, but search engines are still far away from it. We claim that in order for a search engine to be called semantic it is necessary that it recognizes para-queries by returning the same search results for all para-queries of a given query. Recognizing para-queries is an important and desired ability of a search engine. It can relieve users of the burden of rephrasing queries in order to improve the relevance of results. In this paper, we cover two main threads: monolingual para-query generation (PG) and para-query recognition measurement (PRM). Para-query generation aims to automatically generate as many English para-queries as possible for a given query. We propose a novel game "Rephraser" to tackle this problem. Hundreds of para-query templates are extracted from the game's output and used to compose tens of thousands of para-queries. The goal of para-query recognition measurement is to examine to what level search engines recognize para-queries. We propose the concept of paraphrasing invariance coefficient (PIC) which is defined as the probability that search results are the same for a pair of para-queries. By using para-queries generated from the game, we design experiments to measure search engines' PIC. Results show that today's leading search engines are still inferior to human ability in recognizing para-queries. It is a long way ahead for search to be truly semantic.