The vocabulary problem in human-system communication
Communications of the ACM
Towards interactive query expansion
SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
A theory of term weighting based on exploratory data analysis
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Improving automatic query expansion
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Extended Boolean information retrieval
Communications of the ACM
Relevance based language models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Online Information Retrieval: Concepts, Principles, and Techniques
Online Information Retrieval: Concepts, Principles, and Techniques
Combining the language model and inference network approaches to retrieval
Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
Evaluating implicit feedback models using searcher simulations
ACM Transactions on Information Systems (TOIS)
Generating query substitutions
Proceedings of the 15th international conference on World Wide Web
Context sensitive stemming for web search
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
How do users find things with PubMed?: towards automatic utility evaluation with user simulations
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Generalized inverse document frequency
Proceedings of the 17th ACM conference on Information and knowledge management
Mining term association patterns from search logs for effective query reformulation
Proceedings of the 17th ACM conference on Information and knowledge management
Effective and efficient structured retrieval
Proceedings of the 18th ACM conference on Information and knowledge management
Query reformulation using anchor text
Proceedings of the third ACM international conference on Web search and data mining
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Modeling reformulation using passage analysis
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
People are seldom aware that their search queries frequently mismatch a majority of the relevant documents. This may not be a big problem for topics with a large and diverse set of relevant documents, but would largely increase the chance of search failure for less popular search needs. We aim to address the mismatch problem by developing accurate and simple queries that require minimal effort to construct. This is achieved by targeting retrieval interventions at the query terms that are likely to mismatch relevant documents. For a given topic, the proportion of relevant documents that do not contain a term measures the probability for the term to mismatch relevant documents, or the term mismatch probability. Recent research demonstrates that this probability can be estimated reliably prior to retrieval. Typically, it is used in probabilistic retrieval models to provide query dependent term weights. This paper develops a new use: Automatic diagnosis of term mismatch. A search engine can use the diagnosis to suggest manual query reformulation, guide interactive query expansion, guide automatic query expansion, or motivate other responses. The research described here uses the diagnosis to guide interactive query expansion, and create Boolean conjunctive normal form (CNF) structured queries that selectively expand 'problem' query terms while leaving the rest of the query untouched. Experiments with TREC Ad-hoc and Legal Track datasets demonstrate that with high quality manual expansion, this diagnostic approach can reduce user effort by 33%, and produce simple and effective structured queries that surpass their bag of word counterparts.