Improving verbose queries using subset distribution

Authors:
Xiaobing Xue;Samuel Huston;W. Bruce Croft
Affiliations:
University of Massachusetts, Amherst, Amherst, MA, USA;University of Massachusetts, Amherst, Amherst, MA, USA;University of Massachusetts, Amherst, Amherst, MA, USA
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 22
Cited 15

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Making large-scale support vector machine learning practical

Advances in kernel methods
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Predicting query performance

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Combining the language model and inference network approaches to retrieval

Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A Markov random field model for term dependencies

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text

Bioinformatics
Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
On GMAP: and other transformations

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A unified and discriminative model for query refinement

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Discovering key concepts in verbose queries

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Analysis of long queries in a large scale search log

Proceedings of the 2009 workshop on Web Search Click Data
Regression Rank: Learning to Meet the Opportunity of Descriptive Queries

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
An improved markov random field model for supporting verbose queries

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Reducing long queries using query quality predictors

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Extracting structured information from user queries with semi-supervised conditional random fields

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Learning concept importance using a weighted dependence model

Proceedings of the third ACM international conference on Web search and data mining
Discriminative probabilistic models for relational data

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence

LambdaMerge: merging the results of query reformulations

Proceedings of the fourth ACM international conference on Web search and data mining
Modeling subset distributions for verbose queries

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A quasi-synchronous dependence model for information retrieval

Proceedings of the 20th ACM international conference on Information and knowledge management
Using query log and social tagging to refine queries based on latent topics

Proceedings of the 20th ACM international conference on Information and knowledge management
Enriching textbooks with images

Proceedings of the 20th ACM international conference on Information and knowledge management
Rewriting null e-commerce queries to recommend products

Proceedings of the 21st international conference companion on World Wide Web
Data mining for improving textbooks

ACM SIGKDD Explorations Newsletter
Topic models for taxonomies

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Generating reformulation trees for complex queries

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Generating queries from user-selected text

Proceedings of the 4th Information Interaction in Context Symposium
Supporting factual statements with evidence from the web

Proceedings of the 21st ACM international conference on Information and knowledge management
Compact query term selection using topically related text

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Automatic query reformulations for text retrieval in software engineering

Proceedings of the 2013 International Conference on Software Engineering
Mining pure high-order word associations via information geometry for information retrieval

ACM Transactions on Information Systems (TOIS)
Detecting verbose queries and improving information retrieval

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dealing with verbose (or long) queries poses a new challenge for information retrieval. Selecting a subset of the original query (a "sub-query") has been shown to be an effective method for improving these queries. In this paper, the distribution of sub-queries ("subset distribution") is formally modeled within a well-grounded framework. Specifically, sub-query selection is considered as a sequential labeling problem, where each query word in a verbose query is assigned a label of "keep" or "don't keep". A novel Conditional Random Field model is proposed to generate the distribution of sub-queries. This model captures the local and global dependencies between query words and directly optimizes the expected retrieval performance on a training set. The experiments, based on different retrieval models and performance measures, show that the proposed model can generate high-quality sub-query distributions and can significantly outperform state-of-the-art techniques.