Biasing web search results for topic familiarity

Authors:
Giridhar Kumaran;Rosie Jones;Omid Madani
Affiliations:
University of Massachusetts, Amherst, MA;Yahoo! Research, Pasadena, CA;Yahoo! Research, Pasadena, CA
Venue:
Proceedings of the 14th ACM international conference on Information and knowledge management
Year:
2005

Citing 1
Cited 5

Random Forests

Machine Learning

Personalizing web search results by reading level

Proceedings of the 20th ACM international conference on Information and knowledge management
An unsupervised ranking method based on a technical difficulty terrain

Proceedings of the 20th ACM international conference on Information and knowledge management
Characterizing web content, user interests, and search behavior by reading level and topic

Proceedings of the fifth ACM international conference on Web search and data mining
Ranking Text Documents Based on Conceptual Difficulty Using Term Embedding and Sequential Discourse Cohesion

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Why Do Users Perceive Search Tasks As Difficult? Exploring Difficulty in Different Task Types

Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Depending on a web searcher's familiarity with a query's target topic, it may be more appropriate to show her introductory or advanced documents. The TREC HARD [1] track defined topic familiarity as meta-data associated with a user's query. We instead define a user-independent and query-independent model of topic-familiarity required to read a document, so it can be matched to a given user in response to a query. An introductory web page is defined as A web page that doesn't presuppose any background knowledge of the topic it is on, and to an extent introduces or defines the key terms in the topic. while an advanced web page is defined as A web page that assumes sufficient background knowledge of the topic it is on, and familiarity with the key technical/ important terms in the topic, and potentially builds on them. We develop a method for biasing the initial mix of documents returned by a search engine to increase the number of documents of desired familiarity level up to position 5, and up to position 10. Our method involves building a supervised text classifier, incorporating features based on reading level, the distribution of stop-words in the text, and non-text features such as average line-length. Using this familiarity classifier, we achieve statistically significant improvements at reranking the result set to show introductory documents higher up the ranked list. Our classifier can be seamlessly integrated into current search engine technology without involving any major modifications to existing architectures.