Health: related information structuring for the semantic web

  • Authors:
  • Mohammad Ali H. Eljinini

  • Affiliations:
  • Isra University Amman, Jordan

  • Venue:
  • Proceedings of the 2011 International Conference on Intelligent Semantic Web-Services and Applications
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The World Wide Web has become an important medium for the dissemination of information related to a wide range of topics. The majority of human information is becoming available on the web very rapidly. In the medical domain, the number of documents related to healthcare is already large and continues to grow at an exponential rate. Most information on the web is buried inside HTML documents which are designed for human consumption. Restructuring information automatically into machine understandable form and making it available to web search agents would bring the web to its full potential. In this work we have downloaded a set of 100 diabetes-related websites, over 12000 HTML files, which have been carefully analyzed. Our intention is first to learn the general structure of these websites which would increase the efficiency of information extraction and structuring. Every website has a purpose mainly providing services or products (or both). Our study resulted in the construction of an ontology covering a set of general services and products that these websites offer. The main goal of such ontology is to provide guidance in the process of extracting and structuring information. We incorporated the Unified Medical Language System (UMLS) Semantic Net which serves as an upper level ontology for medicine. We used the MetaMap Transfer (MMTx) API developed by the US National Library of Medicine (NLM) for mapping text into concepts from the UMLS Semantic Net. Pinpointing concepts in web pages provides an efficient way to determine the attributes and therefore facilitates more efficient extraction and restructuring of information. This paper describes the first part of our work and findings.