Assessing user-specific difficulty of documents

Authors:
Mari-Sanna Paukkeri;Marja Ollikainen;Timo Honkela
Affiliations:
Department of Information and Computer Science, Aalto University School of Science, PO Box 15400, FI-00076 Aalto, Finland;Department of Information and Computer Science, Aalto University School of Science, PO Box 15400, FI-00076 Aalto, Finland;Department of Information and Computer Science, Aalto University School of Science, PO Box 15400, FI-00076 Aalto, Finland
Venue:
Information Processing and Management: an International Journal
Year:
2013

Citing 22
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
A statistical model for scientific readability

Proceedings of the tenth international conference on Information and knowledge management
User Modeling for Adaptive News Access

User Modeling and User-Adapted Interaction
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
Improving User Modelling with Content-Based Techniques

UM '01 Proceedings of the 8th International Conference on User Modeling 2001
Automatic recognition of reading levels from user queries

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions

IEEE Transactions on Knowledge and Data Engineering
Personalizing search via automated analysis of interests and activities

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Predicting reading difficulty with statistical language models

Journal of the American Society for Information Science and Technology
Automatic Recognition of Text Difficulty from Consumers Health Information

CBMS '06 Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems
User-sensitive text summarization: application to the medical domain

User-sensitive text summarization: application to the medical domain
A Classifier to Evaluate Language Specificity of Medical Documents

HICSS '07 Proceedings of the 40th Annual Hawaii International Conference on System Sciences
A machine learning approach to reading level assessment

Computer Speech and Language
Application of Cross-Language Criteria for the Automatic Distinction of Expert and Non Expert Online Health Documents

AIME '07 Proceedings of the 11th conference on Artificial Intelligence in Medicine
User Modelling for Personalized Question Answering

AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing
Cognitively motivated features for readability assessment

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Combining a statistical language model with logistic regression to predict the lexical and syntactic difficulty of texts for FFL

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
An analysis of statistical models and features for reading difficulty prediction

EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications
Real-time web text classification and analysis of reading difficulty

EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications
Personalised news and scientific literature aggregation

Information Processing and Management: an International Journal
Personalization by website transformation: Theory and practice

Information Processing and Management: an International Journal
Learning a taxonomy from a set of text documents

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

On the web, a huge variety of text collections contain knowledge in different expertise domains, such as technology or medicine. The texts are written for different uses and thus for people having different levels of expertise on the domain. Texts intended for professionals may not be understandable at all by a lay person, and texts for lay people may not contain all the detailed information needed by a professional. Many information retrieval applications, such as search engines, would offer better user experience if they were able to select the text sources that best fit the expertise level of the user. In this article, we propose a novel approach for assessing the difficulty level of a document: our method assesses difficulty for each user separately. The method enables, for instance, offering information in a personalised manner based on the user's knowledge of different domains. The method is based on the comparison of terms appearing in a document and terms known by the user. We present two ways to collect information about the terminology the user knows: by directly asking the users the difficulty of terms or, as a novel automatic approach, indirectly by analysing texts written by the users. We examine the applicability of the methodology with text documents in the medical domain. The results show that the method is able to distinguish between documents written for lay people and documents written for experts.