Toward automated consumer question answering: Automatically separating consumer questions from professional questions in the healthcare domain

Authors:
Feifan Liu;Lamont D. Antieau;Hong Yu
Affiliations:
Department of Health Sciences, University of Wisconsin-Milwaukee, Milwaukee, WI, United States;Department of Health Sciences, University of Wisconsin-Milwaukee, Milwaukee, WI, United States;Department of Health Sciences, University of Wisconsin-Milwaukee, Milwaukee, WI, United States and Department of Computer Science, University of Wisconsin-Milwaukee, Milwaukee, WI, United States
Venue:
Journal of Biomedical Informatics
Year:
2011

Citing 18
Cited 1

Stemming algorithms: a case study for detailed evaluation

Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Using analytic QP and sparseness to speed training of support vector machines

Proceedings of the 1998 conference on Advances in neural information processing systems II
Modern Information Retrieval

Modern Information Retrieval
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Consumers of e-health: patterns of use and barriers

Social Science Computer Review - Special issue: Sociology and computing
Learning question classifiers

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Question classification with support vector machines and error correcting codes

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Parsing and question classification for question answering

ODQA '01 Proceedings of the workshop on Open-domain question answering - Volume 12
HITIQA: an interactive question answering system a preliminary report

MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
Question classification using HDAG kernel

MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
Finding high-quality content in social media

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Exploring question subjectivity prediction in community QA

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation of the clinical question answering presentation

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
CoCQA: co-training over questions and answers with an application to predicting question subjectivity orientation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
HITIQA: a data driven approach to interactive analytical question answering

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion

Bioinformatics
Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion

Bioinformatics

Text classification for assisting moderators in online health communities

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objective: Both healthcare professionals and healthcare consumers have information needs that can be met through the use of computers, specifically via medical question answering systems. However, the information needs of both groups are different in terms of literacy levels and technical expertise, and an effective question answering system must be able to account for these differences if it is to formulate the most relevant responses for users from each group. In this paper, we propose that a first step toward answering the queries of different users is automatically classifying questions according to whether they were asked by healthcare professionals or consumers. Design: We obtained two sets of consumer questions (~10,000 questions in total) from Yahoo answers. The professional questions consist of two question collections: 4654 point-of-care questions (denoted as PointCare) obtained from interviews of a group of family doctors following patient visits and 5378 questions from physician practices through professional online services (denoted as OnlinePractice). With more than 20,000 questions combined, we developed supervised machine-learning models for automatic classification between consumer questions and professional questions. To evaluate the robustness of our models, we tested the model that was trained on the Consumer-PointCare dataset on the Consumer-OnlinePractice dataset. We evaluated both linguistic features and statistical features and examined how the characteristics in two different types of professional questions (PointCare vs. OnlinePractice) may affect the classification performance. We explored information gain for feature reduction and the back-off linguistic category features. Results: The 10-fold cross-validation results showed the best F1-measure of 0.936 and 0.946 on Consumer-PointCare and Consumer-OnlinePractice respectively, and the best F1-measure of 0.891 when testing the Consumer-PointCare model on the Consumer-OnlinePractice dataset. Conclusion: Healthcare consumer questions posted at Yahoo online communities can be reliably classified from professional questions posted by point-of-care clinicians and online physicians. The supervised machine-learning models are robust for this task. Our study will significantly benefit further development in automated consumer question answering.