A general language model for information retrieval
Proceedings of the eighth international conference on Information and knowledge management
Statistical Pattern Recognition: A Review
IEEE Transactions on Pattern Analysis and Machine Intelligence
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A statistical model for scientific readability
Proceedings of the tenth international conference on Information and knowledge management
Computational Statistics & Data Analysis - Nonlinear methods and data mining
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Comparing link marker visualization techniques: changes in reading behavior
WWW '03 Proceedings of the 12th international conference on World Wide Web
Automated scoring using a hybrid feature identification technique
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Discriminative models for information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to rank using gradient descent
ICML '05 Proceedings of the 22nd international conference on Machine learning
Automatic summarization of search engine hit lists
RANLPIR '00 Proceedings of the ACL-2000 workshop on Recent advances in natural language processing and information retrieval: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 11
A framework to predict the quality of answers with non-textual features
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Summary attributes and perceived search quality
Proceedings of the 16th international conference on World Wide Web
The influence of caption features on clickthrough patterns in web search
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Finding high-quality content in social media
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Evaluating web search result summaries
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Finding support sentences for entities
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
EUSUM: extracting easy-to-understand english summaries for non-native readers
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
SMS-based web search for low-end mobile devices
Proceedings of the sixteenth annual international conference on Mobile computing and networking
Learning to predict readability using diverse linguistic features
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Quality-biased ranking of web documents
Proceedings of the fourth ACM international conference on Web search and data mining
Identifying enrichment candidates in textbooks
Proceedings of the 20th international conference companion on World wide web
ViewSer: enabling large-scale remote user studies of web search examination and interaction
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Enhanced results for web search
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Search snippet evaluation at yandex: lessons learned and future directions
CLEF'11 Proceedings of the Second international conference on Multilingual and multimodal information access evaluation
Measuring Comprehensibility of Web Pages Based on Link Analysis
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
To each his own: personalized content selection based on text comprehensibility
Proceedings of the fifth ACM international conference on Web search and data mining
Non-linear models for confidence estimation
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Improving search result summaries by using searcher behavior data
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Term level search result diversification
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
How unfamiliar words in smartphone manuals affect senior citizens
UAHCI'13 Proceedings of the 7th international conference on Universal Access in Human-Computer Interaction: applications and services for quality of life - Volume Part III
Quality estimation for machine translation: some lessons learned
Machine Translation
Hi-index | 0.00 |
Readability is a crucial presentation attribute that web summarization algorithms consider while generating a querybaised web summary. Readability quality also forms an important component in real-time monitoring of commercial search-engine results since readability of web summaries impacts clickthrough behavior, as shown in recent studies, and thus impacts user satisfaction and advertising revenue. The standard approach to computing the readability is to first collect a corpus of random queries and their corresponding search result summaries, and then each summary is then judged by a human for its readabilty quality. An average readability score is then reported. This process is time consuming and expensive. Besides, the manual evaluation process can not be used in the real-time summary generation process. In this paper we propose a machine learning approach to the problem. We use the corpus as described above and extract summary features that we think may characterize readability. We then estimate a model (gradient boosted decision tree) that predicts human judgments given the features. This model can then be used in real time to estimate the readability of new (unseen) web search summaries and also be used in the summary generation process. We present results on approximately 5000 editorial judgments collected over the course of a year and show examples where the model predicts the quality well and where it disagrees with human judgments. We compare the results of the model to previous models of readability, most notably Collins-Thompson-Callan, Fog and Flesch-Kincaid, and see that our model shows substantially better correlation with editorial judgments as measured by Pearson's correlation coefficient. The learning algorithm also provides us with the relative importance of the features used.