Unsupervised extraction of template structure in web search queries

Authors:
Sandeep Pandey;Kunal Punera
Affiliations:
Yahoo! Research, Sunnyvale, CA, USA;RelateIQ, Mountain View, CA, USA
Venue:
Proceedings of the 21st international conference on World Wide Web
Year:
2012

Citing 22
Cited 2

Concept decompositions for large sparse text data using clustering

Machine Learning
Modern Information Retrieval

Modern Information Retrieval
A taxonomy of web search

ACM SIGIR Forum
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Latent dirichlet allocation

The Journal of Machine Learning Research
Robust classification of rare queries using web knowledge

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Weakly-supervised discovery of named entities using web search queries

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Exploring mouse movements for inferring query intent

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
To swing or not to swing: learning when (not) to advertise

Proceedings of the 17th ACM conference on Information and knowledge management
Large scale multi-label classification via metalabeler

Proceedings of the 18th international conference on World wide web
Analyzing Receiver Operating Characteristic Curves With SAS

Analyzing Receiver Operating Characteristic Curves With SAS
Mining broad latent query aspects from search sessions

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Named entity mining from click-through data using weakly supervised latent dirichlet allocation

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Named entity recognition in query

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Towards rich query interpretation: walking back and forth for mining query templates

Proceedings of the 19th international conference on World wide web
Structured annotations of web queries

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Estimating advertisability of tail queries for sponsored search

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a click: modeling user behavior on web information systems

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Improving recommendation for long-tail queries via templates

Proceedings of the 20th international conference on World wide web
Jigs and lures: associating web queries with structured entities

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Unsupervised query segmentation using clickthrough for information retrieval

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Synthesizing high utility suggestions for rare web search queries

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

The deep web: woven to catch the middle ground

Proceedings of the 4th international workshop on Web-scale knowledge representation retrieval and reasoning
Crowdsourcing-assisted query structure interpretation

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web search queries are an encoding of the user's search intent and extracting structured information from them can facilitate central search engine operations like improving the ranking of search results and advertisements. Not surprisingly, this area has attracted a lot of attention in the research community in the last few years. The problem is, however, made challenging by the fact that search queries tend to be extremely succinct; a condensation of user search needs to the bare-minimum set of keywords. In this paper we consider the problem of extracting, with no manual intervention, the hidden structure behind the observed search queries in a domain: the origins of the constituent keywords as well as the manner the individual keywords are assembled together. We formalize important properties of the problem and then give a principled solution based on generative models that satisfies these properties. Using manually labeled data we show that the query templates extracted by our solution are superior to those discovered by strong baseline methods. The query templates extracted by our approach have potential uses in many search engine tasks; query answering, advertisement matching and targeting, to name a few. In this paper we study one such task, estimating Query-Advertisability, and empirically demonstrate that using extracted template information can improve performance over and above the current state-of-the-art.