A Web text mining approach based on self-organizing map

  • Authors:
  • Chung-Hong Lee;Hsin-Chang Yang

  • Affiliations:
  • Department of Information Management, Chang Jung University, Tainan, Taiwan;Department of Information Management, Chang Jung University, Tainan, Taiwan

  • Venue:
  • Proceedings of the 2nd international workshop on Web information and data management
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web text mining is a new issue in the knowledge discovery research field. It is aimed to help people discover knowledge from large quantities of semi-structured or unstructured text in the web. Several approaches, including some pure and hybrid information retrieval (IR) methods, have been proposed to tackle such an issue. Among these approaches, combining the Self-Organizing Map (SOM) method with the principles of the vectorspace model, appears to be a promising alternative for the traditional purely IR-based methods in this problem domain. In this paper, a novel SOM-based method using a Chinese corpus for web text mining is presented. The SOM is used to generate two maps, namely the word cluster map and the document cluster map, which reveal the relationships among words and documents respectively. The search process incorporates these two maps and effectively finds the relevant documents according to the keywords specified in the query. The conceptually associated web documents are found not only by the specific keywords but the relevant words found by the word cluster map.