Mining text documents for thematic hierarchies using self-organizing maps

Authors:
Hsin-Chang Yang;Chung-Hong Lee
Affiliations:
Chang Jung University, Taiwan;Chang Jung University, Taiwan
Venue:
Data mining
Year:
2003

Citing 24
Cited 0

Subtopic structuring for full-length document access

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Chinese text segmentation for text retrieval: achievements and problems

Journal of the American Society for Information Science
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
ACTS: an automatic Chinese text segmentation system for full text retrieval

Journal of the American Society for Information Science
On Chinese text retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Combining classifiers in text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Chinese text retrieval without using a dictionary

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Cat-a-Cone: an interactive interface for specifying searches and viewing retrieval results using a large category hierarchy

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Self-organizing maps

Self-organizing maps
Mining Text Using Keyword Distributions

Journal of Intelligent Information Systems
Hypertext-like structures through a SOM network

Proceedings of the tenth ACM Conference on Hypertext and hypermedia : returning to our diverse roots: returning to our diverse roots
A new statistical formula for Chinese text segmentation incorporating contextual information

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A Web text mining approach based on self-organizing map

Proceedings of the 2nd international workshop on Web information and data management
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Exploiting Hierarchy in Text Categorization

Information Retrieval
Automatic Text Categorization and Its Application to Text Retrieval

IEEE Transactions on Knowledge and Data Engineering
TopCat: Data Mining for Topic Identification in a Text Corpus

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
The Cluster-Abstraction Model: Unsupervised Learning of Topic Hierarchies from Text Data

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Text Segmentation by Topic

ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Automatic Text Theme Generation and the Analysis of Text Structure

Automatic Text Theme Generation and the Analysis of Text Structure
Knowledge-based automatic topic identification

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Feature selection and feature extraction for text categorization

HLT '91 Proceedings of the workshop on Speech and Natural Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, many approaches have been devised for mining various kinds of knowledge from texts. One important application of text mining is to identify themes and the semantic relations among these themes for text categorization. Traditionally, these themes were arranged in a hierarchical manner to achieve effective searching and indexing as well as easy comprehension for human beings. The determination of category themes and their hierarchical structures was mostly done by human experts. In this work, we developed an approach to automatically generate category themes and reveal the hierarchical structure among them. We also used the generated structure to categorize text documents. The document collection was trained by a self-organizing map to form two feature maps. We then analyzed these maps and obtained the category themes and their structure. Although the test corpus contains documents written in Chinese, the proposed approach can be applied to documents written in any language, and such documents can be transformed into a list of separated terms.