Automatic Recognition of Text Difficulty from Consumers Health Information

Authors:
Yunli Wang
Affiliations:
National Research Council Canada
Venue:
CBMS '06 Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems
Year:
2006

Citing 0
Cited 3

Automatic Retrieval of Web Pages with Standards of Ethics and Trustworthiness Within a Medical Portal: What a Page Name Tells Us

AIME '07 Proceedings of the 11th conference on Artificial Intelligence in Medicine
Application of Cross-Language Criteria for the Automatic Distinction of Expert and Non Expert Online Health Documents

AIME '07 Proceedings of the 11th conference on Artificial Intelligence in Medicine
Assessing user-specific difficulty of documents

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Internet is used as one of major sources of health information. However, some studies show that the readability of health information presented on health web sites is difficult for many consumers. Readability formulas usually measure difficulty of writing style, instead of difficulty of content. In order to recommend health information with appropriate reading level to consumers, we investigate the feasibility of identifying text difficulty of health information using machine learning methods. Support Vector Machine is used to classify consumer health information into easy to read and reading level for the general public. Three feature sets: surface linguistic features, word difficulty features, unigrams and their combinations are compared in terms of classification accuracy. Unigram features alone reach an accuracy of 80.71%, and the combination of three feature sets is the most effective in classification with accuracy of 84.06%. They are significantly better than surface linguistic features, word difficulty features and their combination.