Extracting meaningful labels for WEBSOM text archives

Authors:
Arnulfo P. Azcarraga;Teddy N. Yap, Jr.
Affiliations:
National University of Singapore, Singapore;De La Salle University, Manila, Philippines
Venue:
Proceedings of the tenth international conference on Information and knowledge management
Year:
2001

Citing 7
Cited 4

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
A self-organizing semantic map for information retrieval

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection, perceptron learning, and a usability case study for text categorization

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Websom for Textual Data Mining

Artificial Intelligence Review - Special issue on data mining on the Internet
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Mining Text Archives: Creating Readable Maps to Structure and Describe Document Collections

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
SOM-Based Methodology for Building Large Text Archives

DASFAA '01 Proceedings of the 7th International Conference on Database Systems for Advanced Applications

Retrieving News Stories from a News Integration Archive

ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
Evaluating Keyword Selection Methods for WEBSOM Text Archives

IEEE Transactions on Knowledge and Data Engineering
Adaptive topological tree structure for document organisation and visualisation

Neural Networks - 2004 Special issue: New developments in self-organizing systems
Mining dynamic document spaces with massively parallel embedded processors

SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Self-Organizing Maps, being used mainly with data that are not pre-labeled, need automatic procedures for extracting keywords as labels for each of the map units. The WEBSOM methodology for building very large text archives has a very slow method for extracting such unit labels. It computes the relative frequencies of all the words of all the documents associated to each unit and then compares these to the relative frequencies of all the words of all the other units of the map. Since maps may have more than 100,000 units and the archive may contain up to 7 million documents, the existing WEBSOM method is not practical. This paper describes how the meaningful labels per map unit can be deduced by analyzing the relative weight distribution of the SOM weight vectors and by taking advantage of some characteristics of the random projection method used in dimensionality reduction. The effectiveness of this technique is demonstrated on archives of the well studied Reuters and CNN collections. Comparisons with the WEBSOM method are provided.