Characterizing web content, user interests, and search behavior by reading level and topic

Authors:
Jin Young Kim;Kevyn Collins-Thompson;Paul N. Bennett;Susan T. Dumais
Affiliations:
University of Massachusetts, Amherst, Amherst, MA, USA;Microsoft Research, Redmond, WA, USA;Microsoft Research, Redmond, WA, USA;Microsoft Research, Redmond, WA, USA
Venue:
Proceedings of the fifth ACM international conference on Web search and data mining
Year:
2012

Citing 15
Cited 8

The effects of topic familiarity on information search behavior

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Similarity-based methods for word sense disambiguation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Information retrieval for language tutoring: an overview of the REAP project

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Biasing web search results for topic familiarity

Proceedings of the 14th ACM international conference on Information and knowledge management
A large-scale evaluation and analysis of personalized search strategies

Proceedings of the 16th international conference on World Wide Web
Context sensitive stemming for web search

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Characterizing the influence of domain expertise on web search behavior

Proceedings of the Second ACM International Conference on Web Search and Data Mining
PSkip: estimating relevance ranking quality from web search clickthrough data

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical estimation of word acquisition with application to readability prediction

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Classification-enhanced ranking

Proceedings of the 19th international conference on World wide web
Adapting boosting for information retrieval measures

Information Retrieval
Predicting short-term interests using activity-based search context

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Searchable web sites recommendation

Proceedings of the fourth ACM international conference on Web search and data mining
Personalizing web search results by reading level

Proceedings of the 20th ACM international conference on Information and knowledge management
To each his own: personalized content selection based on text comprehensibility

Proceedings of the fifth ACM international conference on Web search and data mining

To each his own: personalized content selection based on text comprehensibility

Proceedings of the fifth ACM international conference on Web search and data mining
Personalizing atypical web search sessions

Proceedings of the sixth ACM international conference on Web search and data mining
Ranking Text Documents Based on Conceptual Difficulty Using Term Embedding and Sequential Discourse Cohesion

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Document features predicting assessor disagreement

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Understanding how people interact with web search results that change in real-time using implicit feedback

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Penguins in sweaters, or serendipitous entity search on user-generated content

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Lessons from the journey: a query log analysis of within-session learning

Proceedings of the 7th ACM international conference on Web search and data mining
Struggling or exploring?: disambiguating long search sessions

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

A user's expertise or ability to understand a document on a given topic is an important aspect of that document's relevance. However, this aspect has not been well-explored in information retrieval systems, especially those at Web scale where the great diversity of content, users, and tasks presents an especially challenging search problem. To help improve our modeling and understanding of this diversity, we apply automatic text classifiers, based on reading difficulty and topic prediction, to estimate a novel type of profile for important entities in Web search -- users, websites, and queries. These profiles capture topic and reading level distributions, which we then use in conjunction with search log data to characterize and compare different entities. We find that reading level and topic distributions provide an important new representation of Web content and user interests, and that using both together is more effective than using either one separately. In particular we find that: 1) the reading level of Web content and the diversity of visitors to a website can vary greatly by topic; 2) the degree to which a user's profile matches with a site's profile is closely correlated with the user's preference of the website in search results, and 3) site or URL profiles can be used to predict 'expertness' whether a given site or URL is oriented toward expert vs. non-expert users. Our findings provide strong evidence in favor of jointly incorporating reading level and topic distribution metadata into a variety of critical tasks in Web information systems.