Context sensitive stemming for web search

Authors:
Fuchun Peng;Nawaaz Ahmed;Xin Li;Yumao Lu
Affiliations:
Yahoo Inc., Sunnyvale, CA;Yahoo Inc., Sunnyvale, CA;Yahoo Inc., Sunnyvale, CA;Yahoo Inc., Sunnyvale, CA
Venue:
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2007

Citing 19
Cited 22

On term selection for query expansion

Journal of Documentation
Viewing morphology as an inference process

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Stemming algorithms: a case study for detailed evaluation

Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Viewing stemming as recall enhancement

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Fast and effective query refinement

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus-based stemming using cooccurrence of word variants

ACM Transactions on Information Systems (TOIS)
Information Retrieval

Information Retrieval
Modern Information Retrieval

Modern Information Retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Term conflation for information retrieval

SIGIR '84 Proceedings of the 7th annual international ACM SIGIR conference on Research and development in information retrieval
Using terminological feedback for web search refinement: a log-based study

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
A framework for selective query expansion

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Generating query substitutions

Proceedings of the 15th international conference on World Wide Web
Word normalization and decompounding in mono- and bilingual IR

Information Retrieval
Improving web search ranking by incorporating user behavior information

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Semantic term matching in axiomatic approaches to information retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Mining dependency relations for query expansion in passage retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

A unified and discriminative model for query refinement

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Mining term association patterns from search logs for effective query reformulation

Proceedings of the 17th ACM conference on Information and knowledge management
Online expansion of rare queries for sponsored search

Proceedings of the 18th international conference on World wide web
Current research issues and trends in non-English Web searching

Information Retrieval
Query reformulation using anchor text

Proceedings of the third ACM international conference on Web search and data mining
How good is a span of terms?: exploiting proximity to improve web retrieval

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Modeling reformulation using passage analysis

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A novel corpus-based stemming algorithm using co-occurrence statistics

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
An unsupervised method to improve Spanish stemmer

NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Coreference aware web object retrieval

Proceedings of the 20th ACM international conference on Information and knowledge management
Using query log and social tagging to refine queries based on latent topics

Proceedings of the 20th ACM international conference on Information and knowledge management
A fuzzy ranking approach for improving search results in Turkish as an agglutinative language

Expert Systems with Applications: An International Journal
Characterizing web content, user interests, and search behavior by reading level and topic

Proceedings of the fifth ACM international conference on Web search and data mining
Effective query formulation with multiple information sources

Proceedings of the fifth ACM international conference on Web search and data mining
Natural language technology and query expansion: issues, state-of-the-art and perspectives

Journal of Intelligent Information Systems
Adaptive query suggestion for difficult queries

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Automatic term mismatch diagnosis for selective query expansion

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Generating reformulation trees for complex queries

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Modeling higher-order term dependencies in information retrieval using query hypergraphs

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Domain dependent query reformulation for web search

Proceedings of the 21st ACM international conference on Information and knowledge management
Modeling reformulation using query distributions

ACM Transactions on Information Systems (TOIS)
Effective and Robust Query-Based Stemming

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditionally, stemming has been applied to Information Retrieval tasks by transforming words in documents to the their root form before indexing, and applying a similar transformation to query terms. Although it increases recall, this naive strategy does not work well for Web Search since it lowers precision and requires a significant amount of additional computation. In this paper, we propose a context sensitive stemming method that addresses these two issues. Two unique properties make our approach feasible for Web Search. First, based on statistical language modeling, we perform context sensitive analysis on the query side. We accurately predict which of its morphological variants is useful to expand a query term with before submitting the query to the search engine. This dramatically reduces the number of bad expansions, which in turn reduces the cost of additional computation and improves the precision at the same time. Second, our approach performs a context sensitive document matching for those expanded variants. This conservative strategy serves as a safeguard against spurious stemming, and it turns out to be very important for improving precision. Using word pluralization handling as an example of our stemming approach, our experiments on a major Web search engine show that stemming only 29% of the query traffic, we can improve relevance as measured by average Discounted Cumulative Gain (DCG5) by 6.1% on these queriesand 1.8% over all query traffic.