Unsupervised query segmentation using only query logs

Authors:
Nikita Mishra;Rishiraj Saha Roy;Niloy Ganguly;Srivatsan Laxman;Monojit Choudhury
Affiliations:
Indian Institute of Technology Kharagpur, Kharagpur, India;Indian Institute of Technology Kharagpur, Kharagpur, India;Indian Institute of Technology Kharagpur, Kharagpur, India;Microsoft Research India, Bengaluru, India;Microsoft Research India, Bengaluru, India
Venue:
Proceedings of the 20th international conference companion on World wide web
Year:
2011

Citing 2
Cited 7

Unsupervised query segmentation using generative language models and wikipedia

Proceedings of the 17th international conference on World Wide Web
Named entity recognition in query

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Query segmentation revisited

Proceedings of the 20th international conference on World wide web
An IR-based evaluation framework for web search query segmentation

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Interactive pattern mining on hidden data: a sampling-based solution

Proceedings of the 21st ACM international conference on Information and knowledge management
Towards optimum query segmentation: in doubt without

Proceedings of the 21st ACM international conference on Information and knowledge management
Analyzing linguistic structure of web search queries

Proceedings of the 22nd international conference on World Wide Web companion
On segmentation of eCommerce queries

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Efficient parsing-based search over structured data

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce an unsupervised query segmentation scheme that uses query logs as the only resource and can effectively capture the structural units in queries. We believe that Web search queries have a unique syntactic structure which is distinct from that of English or a bag-of-words model. The segments discovered by our scheme help understand this underlying grammatical structure. We apply a statistical model based on Hoeffding's Inequality to mine significant word n-grams from queries and subsequently use them for segmenting the queries. Evaluation against manually segmented queries shows that this technique can detect rare units that are missed by our Pointwise Mutual Information (PMI) baseline.