Text classification for assisting moderators in online health communities

Authors:
Jina Huh;Meliha Yetisgen-Yildiz;Wanda Pratt
Affiliations:
Department of Telecommunication, Information Studies, and Media, Michigan State University, 404 Wilson Rd, Rm 409, East Lansing, MI 48864, USA;Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, 1959 NE Pacific Street, Seattle, WA 98195, USA;Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, 1959 NE Pacific Street, Seattle, WA 98195, USA and The Information School, University of W ...
Venue:
Journal of Biomedical Informatics
Year:
2013

Citing 13
Cited 1

An evaluation of phrasal and clustered representations on a text categorization task

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Machine Learning

Machine Learning
Medical WordNet: a new methodology for the construction and validation of information resources for consumer health

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Using machine learning to augment collaborative filtering of community discussions

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
AskHERMES: An online question answering system for complex clinical questions

Journal of Biomedical Informatics
Toward automated consumer question answering: Automatically separating consumer questions from professional questions in the healthcare domain

Journal of Biomedical Informatics
Preprocessing unbalanced data using support vector machine

Decision Support Systems
Tackling dilemmas in supporting 'the whole person' in online patient communities

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Social media mining for drug safety signal detection

Proceedings of the 2012 international workshop on Smart health and wellbeing
Predicting postpartum changes in emotion and behavior via social media

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Editorial: Biomedical information through the implementation of social media environments

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objectives: Patients increasingly visit online health communities to get help on managing health. The large scale of these online communities makes it impossible for the moderators to engage in all conversations; yet, some conversations need their expertise. Our work explores low-cost text classification methods to this new domain of determining whether a thread in an online health forum needs moderators' help. Methods: We employed a binary classifier on WebMD's online diabetes community data. To train the classifier, we considered three feature types: (1) word unigram, (2) sentiment analysis features, and (3) thread length. We applied feature selection methods based on @g^2 statistics and under sampling to account for unbalanced data. We then performed a qualitative error analysis to investigate the appropriateness of the gold standard. Results: Using sentiment analysis features, feature selection methods, and balanced training data increased the AUC value up to 0.75 and the F1-score up to 0.54 compared to the baseline of using word unigrams with no feature selection methods on unbalanced data (0.65 AUC and 0.40 F1-score). The error analysis uncovered additional reasons for why moderators respond to patients' posts. Discussion: We showed how feature selection methods and balanced training data can improve the overall classification performance. We present implications of weighing precision versus recall for assisting moderators of online health communities. Our error analysis uncovered social, legal, and ethical issues around addressing community members' needs. We also note challenges in producing a gold standard, and discuss potential solutions for addressing these challenges. Conclusion: Social media environments provide popular venues in which patients gain health-related information. Our work contributes to understanding scalable solutions for providing moderators' expertise in these large-scale, social media environments.