Stemming algorithms: a case study for detailed evaluation
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Using analytic QP and sparseness to speed training of support vector machines
Proceedings of the 1998 conference on Advances in neural information processing systems II
Modern Information Retrieval
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Consumers of e-health: patterns of use and barriers
Social Science Computer Review - Special issue: Sociology and computing
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Question classification with support vector machines and error correcting codes
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Parsing and question classification for question answering
ODQA '01 Proceedings of the workshop on Open-domain question answering - Volume 12
HITIQA: an interactive question answering system a preliminary report
MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
Question classification using HDAG kernel
MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
Finding high-quality content in social media
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Exploring question subjectivity prediction in community QA
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation of the clinical question answering presentation
BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
HITIQA: a data driven approach to interactive analytical question answering
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Text classification for assisting moderators in online health communities
Journal of Biomedical Informatics
Hi-index | 0.00 |
Objective: Both healthcare professionals and healthcare consumers have information needs that can be met through the use of computers, specifically via medical question answering systems. However, the information needs of both groups are different in terms of literacy levels and technical expertise, and an effective question answering system must be able to account for these differences if it is to formulate the most relevant responses for users from each group. In this paper, we propose that a first step toward answering the queries of different users is automatically classifying questions according to whether they were asked by healthcare professionals or consumers. Design: We obtained two sets of consumer questions (~10,000 questions in total) from Yahoo answers. The professional questions consist of two question collections: 4654 point-of-care questions (denoted as PointCare) obtained from interviews of a group of family doctors following patient visits and 5378 questions from physician practices through professional online services (denoted as OnlinePractice). With more than 20,000 questions combined, we developed supervised machine-learning models for automatic classification between consumer questions and professional questions. To evaluate the robustness of our models, we tested the model that was trained on the Consumer-PointCare dataset on the Consumer-OnlinePractice dataset. We evaluated both linguistic features and statistical features and examined how the characteristics in two different types of professional questions (PointCare vs. OnlinePractice) may affect the classification performance. We explored information gain for feature reduction and the back-off linguistic category features. Results: The 10-fold cross-validation results showed the best F1-measure of 0.936 and 0.946 on Consumer-PointCare and Consumer-OnlinePractice respectively, and the best F1-measure of 0.891 when testing the Consumer-PointCare model on the Consumer-OnlinePractice dataset. Conclusion: Healthcare consumer questions posted at Yahoo online communities can be reliably classified from professional questions posted by point-of-care clinicians and online physicians. The supervised machine-learning models are robust for this task. Our study will significantly benefit further development in automated consumer question answering.