A Web text mining approach based on self-organizing map

Authors:
Chung-Hong Lee;Hsin-Chang Yang
Affiliations:
Department of Information Management, Chang Jung University, Tainan, Taiwan;Department of Information Management, Chang Jung University, Tainan, Taiwan
Venue:
Proceedings of the 2nd international workshop on Web information and data management
Year:
1999

Citing 3
Cited 6

Self-organizing maps

Self-organizing maps
Mining Text Using Keyword Distributions

Journal of Intelligent Information Systems
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval

Automatic Category Structure Generation and Categorization of Chinese Text Documents

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Automatic Hypertext Construction through a Text Mining Approach by Self-Organizing Maps

PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Mining text documents for thematic hierarchies using self-organizing maps

Data mining
Automatic Category Theme Identification and Hierarchy Generation for Chinese Text Categorization

Journal of Intelligent Information Systems
A text mining approach for automatic construction of hypertexts

Expert Systems with Applications: An International Journal
Multilingual document mining and navigation using self-organizing maps

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web text mining is a new issue in the knowledge discovery research field. It is aimed to help people discover knowledge from large quantities of semi-structured or unstructured text in the web. Several approaches, including some pure and hybrid information retrieval (IR) methods, have been proposed to tackle such an issue. Among these approaches, combining the Self-Organizing Map (SOM) method with the principles of the vectorspace model, appears to be a promising alternative for the traditional purely IR-based methods in this problem domain. In this paper, a novel SOM-based method using a Chinese corpus for web text mining is presented. The SOM is used to generate two maps, namely the word cluster map and the document cluster map, which reveal the relationships among words and documents respectively. The search process incorporates these two maps and effectively finds the relevant documents according to the keywords specified in the query. The conceptually associated web documents are found not only by the specific keywords but the relevant words found by the word cluster map.