Managing knowledge on the Web - Extracting ontology from HTML Web

  • Authors:
  • Timon C. Du;Feng Li;Irwin King

  • Affiliations:
  • Department of Decision Sciences and Managerial Economics, The Chinese University of Hong Kong, Hong Kong;School of Business Administration, South China University of Technology, China;Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong

  • Venue:
  • Decision Support Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years, the Internet has become one of the most important sources of information, and it is now imperative that companies are able to collect, retrieve, process, and manage information from the Web. However, due to the sheer amount of information available, browsing web content by searches using keywords is inefficient, largely because unstructured HTML web pages are written for human comprehension and not for direct machine processing. For the same reason, the degree of web automation is limited. It is recognized that semantics can enhance web automation, but it will take an indefinite amount of effort to convert the current HTML Web into the Semantic Web. This study proposes a novel ontology extractor, called OntoSpider, for extracting ontology from the HTML Web. The contribution of this work is the design and implementation of a six-phase process that includes the preparation, transformation, clustering, recognition, refinement, and revision for extracting ontology from unstructured HTML pages. The extracted ontology provides structured and relevant information for applications such as e-commerce and knowledge management that can be compared and analyzed more effectively. We give detailed information on the system and provide a series of experimental results that validate the system design and illustrate the effectiveness of OntoSpider.