Semantic Space models for classification of consumer webpages on metadata attributes

  • Authors:
  • Guocai Chen;Jim Warren;Patricia Riddle

  • Affiliations:
  • Department of Computer Science, The University of Auckland, New Zealand;Department of Computer Science, The University of Auckland, New Zealand and School of Population Health, The University of Auckland, New Zealand;Department of Computer Science, The University of Auckland, New Zealand

  • Venue:
  • Journal of Biomedical Informatics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

To deal with the quantity and quality issues with online healthcare resources, creating web portals centred on particular health topics and/or communities of users is a strategy to provide access to a reduced corpus of information resources that meet quality and relevance criteria. In this paper we use hyperspace analogue to language (HAL) to model the language use patterns of webpages as Semantic Spaces. We have applied machine learning methods, including support vector machine (SVM), decision forest, and a novel summed similarity measure (SSM) to automatically classify online webpages on their Semantic Space models. We find classification accuracy on metadata attributes to be over 93% for 'medical' versus 'supportive' perspective, over 92% for disease stage of 'early' versus 'advanced', and over 90% for author credentials of 'lay' versus 'clinician' based on webpages of the Breast Cancer Knowledge Online portal. These results indicate that language use patterns can be used to automate such classification with useful levels of accuracy.