Automatic generation of semantically enriched web pages by a text mining approach

  • Authors:
  • Hsin-Chang Yang

  • Affiliations:
  • Department of Information Management, National University of Kaohsiung, Kaohsiung 811, Taiwan, ROC

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2009

Quantified Score

Hi-index 12.05

Visualization

Abstract

Nowadays most of the Web pages contain little amount of structure and supporting information that can reveal their semantics or meanings. To enable automated processing of the Web pages, semantic information such as metadata and tags regarding to each page should be added to it. Several authoring tools have been developed to help users tackling this task. However, manual or semi-automatic authoring is implausible when we intend to annotate large amount of Web pages. In this work, we proposed a method to automatically generate some descriptive metadata and tags for a Web page. The idea is to apply the self-organizing map algorithm to cluster the Web pages and discover the relationships between these clusters. In the mean time, the themes of each cluster are also identified. We then use such relationships and themes to tag the Web pages and generate metadata for the Web pages. The result of experiments shows that our method may generate semantically relevant metadata and tags for the Web pages.