Assessment of utility in web mining for the domain of public health

Authors:
Peter von Etter;Silja Huttunen;Arto Vihavainen;Matti Vuorinen;Roman Yangarber
Affiliations:
University of Helsinki, Finland;University of Helsinki, Finland;University of Helsinki, Finland;University of Helsinki, Finland;University of Helsinki, Finland
Venue:
Louhi '10 Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents
Year:
2010

Citing 4
Cited 1

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Information extraction for enhanced access to disease outbreak reports

Journal of Biomedical Informatics - Special issue: Sublanguage
Complexity of event structure in IE scenarios

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Confidence estimation for information extraction

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers

Building support tools for Russian-language information extraction

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents ongoing work on application of Information Extraction (IE) technology to domain of Public Health, in a real-world scenario. A central issue in IE is the quality of the results. We present two novel points. First, we distinguish the criteria for quality: the objective criteria that measure correctness of the system's analysis in traditional terms (F-measure, recall and precision), and, on the other hand, subjective criteria that measure the utility of the results to the end-user. Second, to obtain measures of utility, we build an environment that allows users to interact with the system by rating the analyzed content. We then build and compare several classifiers that learn from the user's responses to predict the relevance scores for new events. We conduct experiments with learning to predict relevance, and discuss the results and their implications for text mining in the domain of Public Health.