Automatically generated consumer health metadata using semantic spaces

  • Authors:
  • Guocai Chen;Jim Warren;Joanne Evans

  • Affiliations:
  • The University of Auckland, New Zealand;The University of Auckland, New Zealand;Monash University, Australia

  • Venue:
  • HDKM '08 Proceedings of the second Australasian workshop on Health data and knowledge management - Volume 80
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The continual growth of the World Wide Web presents the (also growing) population of health information seekers with the challenge of finding reliable information that is appropriate to their needs. Metadata about consumer health websites can provide a guide for end users and domain-specific search tools. In this paper we present and demonstrate a method for automatically inferring a non-trivial metadata attribute that has been encoded for breast cancer websites: whether the site is 'medical' or 'supportive' in tone. We induce decision trees to distinguish Medical vs. Supportive sites based on feature vectors of word co-occurrence patterns, founded in a semantic space model called Hyperspace Analog to Language (HAL). We achieve 82% (95% CI: 74% to 91%) classification accuracy. This should already be a useful capability for human metadata coders or to support on-the-fly queries, and it inspires us to further investigate metadata classifiers based on HAL features.