Extracting Topic Maps from Web Pages

  • Authors:
  • Motohiro Mase;Seiji Yamada;Katsumi Nitta

  • Affiliations:
  • Tokyo Institute of Technology, Japan;National Institute of Informatics, Japan;Tokyo Institute of Technology, Japan

  • Venue:
  • New Frontiers in Applied Data Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a framework to extract topic maps from a set of Web pages. We use the clustering method with the Web pages and extract the topic map prototypes. We introduced the following two points to the existing clustering method: The first is merging only the linked Web pages, thus extracting the underlying relationships between the topics. The second is introducing weighting based on similarity from the contents of the Web pages and relevance between topics of pages. The relevance is based on the types of links with directories in Web sites structure and the distance between the directories in which the pages are located. We generate the topic map prototypes from the results of the clustering. Finally, users complete the prototype by labeling the topics and associations and removing the unnecessary items. For this paper, at the first step, we mounted the proposed clustering method and extracted the prototype with the method.