Machine Learning
Personalizing web search results by reading level
Proceedings of the 20th ACM international conference on Information and knowledge management
An unsupervised ranking method based on a technical difficulty terrain
Proceedings of the 20th ACM international conference on Information and knowledge management
Characterizing web content, user interests, and search behavior by reading level and topic
Proceedings of the fifth ACM international conference on Web search and data mining
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Why Do Users Perceive Search Tasks As Difficult? Exploring Difficulty in Different Task Types
Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval
Hi-index | 0.00 |
Depending on a web searcher's familiarity with a query's target topic, it may be more appropriate to show her introductory or advanced documents. The TREC HARD [1] track defined topic familiarity as meta-data associated with a user's query. We instead define a user-independent and query-independent model of topic-familiarity required to read a document, so it can be matched to a given user in response to a query. An introductory web page is defined as A web page that doesn't presuppose any background knowledge of the topic it is on, and to an extent introduces or defines the key terms in the topic. while an advanced web page is defined as A web page that assumes sufficient background knowledge of the topic it is on, and familiarity with the key technical/ important terms in the topic, and potentially builds on them. We develop a method for biasing the initial mix of documents returned by a search engine to increase the number of documents of desired familiarity level up to position 5, and up to position 10. Our method involves building a supervised text classifier, incorporating features based on reading level, the distribution of stop-words in the text, and non-text features such as average line-length. Using this familiarity classifier, we achieve statistically significant improvements at reranking the result set to show introductory documents higher up the ranked list. Our classifier can be seamlessly integrated into current search engine technology without involving any major modifications to existing architectures.