Evaluating Keyword Selection Methods for WEBSOM Text Archives

Authors:
Arnulfo P. Azcarraga;Teddy N. Yap;Jonathan Tan;Tat Seng Chua
Affiliations:
-;-;-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2004

Citing 6
Cited 3

Self-organization and associative memory: 3rd edition

Self-organization and associative memory: 3rd edition
Self-organizing maps

Self-organizing maps
Websom for Textual Data Mining

Artificial Intelligence Review - Special issue on data mining on the Internet
Extracting meaningful labels for WEBSOM text archives

Proceedings of the tenth international conference on Information and knowledge management
SOM-Based Methodology for Building Large Text Archives

DASFAA '01 Proceedings of the 7th International Conference on Database Systems for Advanced Applications
Self organization of a massive document collection

IEEE Transactions on Neural Networks

Analyzing document collections via context-aware term extraction

NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
Categorization of large text collections: feature selection for training neural networks

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Knowledge discovery in inspection reports of marine structures

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract--The WEBSOM methodology, proven effective for building very large text archives, includes a method that extracts labels for each document cluster assigned to nodes in the map. However, the WEBSOM method needs to retrieve all the words of all the documents associated to each node. Since maps may have more than 100,000 nodes and since the archive may contain up to seven million documents, the WEBSOM methodology needs a faster alternative method for keyword selection. Presented here is such an alternative method that is able to quickly deduce meaningful labels per node in the map. It does this just by analyzing the relative weight distribution of the SOM weight vectors and by taking advantage of some characteristics of the random projection method used in dimensionality reduction. The effectiveness of this technique is demonstrated on news document collections.