Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Toward wellness: women seeking health information
Journal of the American Society for Information Science and Technology - Part I: Information seeking research
A study of web usability for older adults seeking online health resources
ACM Transactions on Computer-Human Interaction (TOCHI)
Generalized Naive Bayes Classifiers
ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Universal Access in the Information Society
Neo-tribes: the power and potential of online communities in health care
Communications of the ACM - Personal information management
CHI '06 Extended Abstracts on Human Factors in Computing Systems
Multi-field information extraction and cross-document fusion
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A Classifier to Evaluate Language Specificity of Medical Documents
HICSS '07 Proceedings of the 40th Annual Hawaii International Conference on System Sciences
Better informed training of latent syntactic features
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Domain-specific iterative readability computation
Proceedings of the 10th annual joint conference on Digital libraries
Hi-index | 0.00 |
Since millions seek health information online, it is vital forthis information to be comprehensible. Most studies use readabilityformulas, which ignore vocabulary, and conclude that online healthinformation is too difficult. We developed a vocabularly-based,naïve Bayes classifier to distinguish between three difficultylevels in text. It proved 98% accurate in a 250-documentevaluation. We compared our classifier with readability formulasfor 90 new documents with different origins and askedrepresentative human evaluators, an expert and a consumer, to judgeeach document. Average readability grade levels for educational andcommercial pages was 10th grade or higher, too difficult accordingto current literature. In contrast, the classifier showed that70-90% of these pages were written at an intermediate, appropriatelevel indicating that vocabulary usage is frequently appropriate intext considered too difficult by readability formula evaluations.The expert considered the pages more difficult for a consumer thanthe consumer did. © 2008 Wiley Periodicals, Inc.