Extracting search-focused key n-grams for relevance ranking in web search

Authors:
Chen Wang;Keping Bi;Yunhua Hu;Hang Li;Guihong Cao
Affiliations:
Fudan University, Shanghai, China;Peking University, Beijing, China;Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China;Microsoft Corporation, Redmond, WA, USA
Venue:
Proceedings of the fifth ACM international conference on Web search and data mining
Year:
2012

Citing 27
Cited 0

Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
KEA: practical automatic keyphrase extraction

Proceedings of the fourth ACM conference on Digital libraries
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing

Communications of the ACM
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Title extraction from bodies of HTML documents and its application to web page retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A Markov random field model for term dependencies

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Query chains: learning to rank from implicit feedback

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Improved automatic keyword extraction given more linguistic knowledge

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Improving web search ranking by incorporating user behavior information

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Using the structure of HTML documents to improve retrieval

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Contextual Ranking of Keywords Using Click Data

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Smoothing clickthrough data for web search ranking

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A ranking approach to keyphrase extraction

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Learning to Rank for Information Retrieval

Foundations and Trends in Information Retrieval
Domain-specific keyphrase extraction

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Learning document aboutness from implicit user feedback and document structure

Proceedings of the 18th ACM conference on Information and knowledge management
Anatomy of the long tail: ordinary people with extraordinary tastes

Proceedings of the third ACM international conference on Web search and data mining
Multi-style language model for web scale information retrieval

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Parameterized concept weighting in verbose queries

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Learning to Rank for Information Retrieval and Natural Language Processing

Learning to Rank for Information Retrieval and Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In web search, relevance ranking of popular pages is relatively easy, because of the inclusion of strong signals such as anchor text and search log data. In contrast, with less popular pages, relevance ranking becomes very challenging due to a lack of information. In this paper the former is referred to as head pages, and the latter tail pages. We address the challenge by learning a model that can extract search-focused key n-grams from web pages, and using the key n-grams for searches of the pages, particularly, the tail pages. To the best of our knowledge, this problem has not been previously studied. Our approach has four characteristics. First, key n-grams are search-focused in the sense that they are defined as those which can compose "good queries" for searching the page. Second, key n-grams are learned in a relative sense using learning to rank techniques. Third, key n-grams are learned using search log data, such that the characteristics of key n-grams in the search log data, particularly in the heads; can be applied to the other data, particularly to the tails. Fourth, the extracted key n-grams are used as features of the relevance ranking model also trained with learning to rank techniques. Experiments validate the effectiveness of the proposed approach with large-scale web search datasets. The results show that our approach can significantly improve relevance ranking performance on both heads and tails; and particularly tails, compared with baseline approaches. Characteristics of our approach have also been fully investigated through comprehensive experiments.